Transcript
Page 1: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Principles for Building Biomedical

Ontologies Suzanna Lewis

National Center Biomedical Ontology22 October 2005

Advanced Bioinformatics, Cold Spring Harbor

Page 2: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

National Center Biomedical Ontology

http://bioontology.org/ Mark Musen

Suzanna Lewis

Barry Smith

Sima Misra

Daniel Rubin

Michael Ashburner

Monte Westerfield

Ida Sim

PI & Core 1: computer science (SMI) Co-PI & Core 2: bioinformatics (BiKR;

GO) Core 6: Outreach and training (ECOR) Associate Program Director Program Director Core 3: Phenotype Project (Cambridge;

FlyBase; and GO) Core 3: Phenotype Project (UOregon; PI

of ZFIN) Core 3: HIV clinical trials Project

(UCSF)

Page 3: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

BiKRs

Sima Misra Shu Shengqiang Christopher J. Mungall

Nomi Harris John Day-Richter Karen Eilbeck Mark Gibson

Page 4: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Outline for the Morning

A definition of “ontology” Four sessions:

Organizational Challenges Principles for Ontology Construction

Case Studies from the GO Case Studies for group discussion.

Page 5: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

My newbie questions

What data is missing?

What I’ve heard

Where is the data generated?

What is the motivation?

How will it be gathered?

Organism, environment, data quality and attribution TIGR, Sanger, JGI, and coming soon to a 954 near you!

Still an issue. Low threshold of effort relative to benefits of complying Data it is accumulating on disks across the world and we’d like to be able to locate and use it

The hardest part: Sharing (semantics)

Page 6: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

handy ontology tells us what’s there…

Where should I eat…?

Ontologies help with decision making

Page 7: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Type of cuisine (Presumable) country of origin

Ontologies don’t just organize data; they also facilitate inference, and that creates new knowledge, often unconsciously in the user.

Page 8: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Where delicatessen food hails from…

‘Frozen Yogurt’ cuisine in search of a national identity?

What a computer would likely infer about the world from this helpful ontology:

Flag of fresh juiceFresh Juice is a national cuisine…

Page 9: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Ontology is all about meaning

Communities form (scientific) theories that seek to explain all of the existing evidence

and can be used for prediction We make inferences and decisions based upon what we know about (biological) reality.

Page 10: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Make our meanings clear enough for a

computer to understand An ontology is a computable representation of this underlying (biological) reality.

An ontology enables a computer to reason over the data in (some of) the ways that we do particularly to query and locate relevant data.

A shared, common, backbone taxonomy of relevant entities, and the relationships between them, within an application domain. Referred to by information scientists as an ’Ontology'.

Page 11: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

But really…

What is an Ontology? From Aristotle to Artificial Intelligence

It is ”a formalism of what exists” Follows formal rules for creating definitions originally laid down by Aristotle.

A definition is: the specification of the essence (nature, invariant structure) shared by all the members of a class or natural kind.

Page 12: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The Aristotelian Methodology

Topmost nodes are the undefinable primitives.

The definition of a class lower down in the hierarchy is provided by specifying the parent of the class together with the relevant differentia.

Differentia tells us what marks out instances of the defined class within the wider parent class as in

Plasma membrane is a cell part [immediate parent] that surrounds the cytoplasm [differentia]

Page 13: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Siamese

mammal

cat

organism

Physical object (substance)

classes

animal

instances

frogleaf class

all members of the class frog share a froggy nature

Page 14: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Thorax

Lung

Heart

Cell

Anatomical structures

Cornelius Rosse

Page 15: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Content of FMA

Challenge:Duplicate graphical model in symbolic model

Adapted fromAdapted fromBloom & Fawcett: Bloom & Fawcett:

Textbook of Textbook of Histology Histology

1994 12th ed1994 12th edChapman & HallChapman & Hall

Universals or classes:Kinds of anatomical entities

Page 16: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Content of FMA

Page 17: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

1. Organizational Challenges

http://obo.sourceforge.net

Page 18: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

So you want an ontology…

What do you have to do to make/get/use/steal/beg one?

Page 19: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Why Survey

Improve

Domain covered

?

Public?

Active?

Applied?

Community?

DevelopSalvage

Collaborate & Learn

yes

no

Page 20: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

What you must do

Justify exactly why there is a need Scope it very, very tightly

Communicate with people

Page 21: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The decisions you must make

What domain does it cover? It is privately held? Is it active? Is it applied?

Page 22: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Why Survey

Improve

Domain covered

?

Public?

Active?

Applied?

Community?

DevelopSalvage

Collaborate & Learn (Listen to Barry)

yes

no

Page 23: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Due diligence & background research

Step 1: Learn what is out there The most comprehensive list is on the OBO site. http://obo.sourceforge.net

Assess ontologies critically and realistically.

Make contact

Page 24: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Why Survey

Improve

Domain covered

?

Public?

Active?

Applied?

Community?

DevelopSalvage

Collaborate & Learn (Listen to Barry)

yes

no

Page 25: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Ontologies must be shared

Proprietary ontologies Belief that ownership of the terminology gives the owners a competitive edge

For example, Incyte or Monsanto in the past, SNOMED for non-US.

Data cannot be shared if the ontologies describing the data are not shared.

Don’t reinvent—Use the power of combination and collaboration

Page 26: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Why Survey

Improve

Domain covered

?

Public?

Active?

Applied?

Community?

DevelopSalvage

Collaborate & Learn (Listen to Barry)

yes

no

Page 27: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Pragmatic assessment of an ontology

Is there access to help, e.g.:[email protected] ?

Does a warm body answer help mail within a ‘reasonable’ time—say 2 working days ?

Page 28: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Why Survey

Improve

Domain covered

?

Public?

Active?

Applied?

Community?

DevelopSalvage

Collaborate & Learn (Listen to Barry)

yes

no

Page 29: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Use it to improve it

Every ontology improves when it is applied to actual data

It improves even more when these data are used to answer questions

There will be fewer problems in the ontology and more commitment to fixing remaining problems when important research data is involved that scientists depend upon

Be very wary of ontologies that have never been applied

Page 30: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Work with that community To improve (if you found one) To develop (if you did not)

Getting it right It is impossible to get it right the 1st (or 2nd, or 3rd, …) time.

What we know about reality is continually growing

Improve

Collaborate and Learn

Page 31: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Implication: “prepare for change”

Establish a mechanism for change. Use CVS or Subversion. Changes must be reviewed by experts

Unique Identifiers Versions Archives

Page 32: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Ontology development is hard

Have a stake in seeing it work. Have broad, detailed domain knowledge.

Will engage in vigorous debate without engaging egos.

Will do concrete work and attend frequent working sessions (quarterly), phone conferences (weekly), e-mail correspondence (daily).

Page 33: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

2. Principles for Ontology Construction

Page 34: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Why do we need rules for good ontology?

Ontologies must be intelligible to humans (for annotation) and to machines (for reasoning and error-checking)

Unintuitive rules for classification lead to entry errors (problematic links)

Facilitate training of curators Overcome obstacles to alignment with other ontology and terminology systems

Enhance harvesting of content through automatic reasoning systems

Following basic rules makes more useful ontologies

Page 35: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Aristotle’s categoriesThis is Aristotle’s list of types of predication, that is, the different ways in which things can be said to be. He identifies 10 mutually exclusive categories.

1. Substance.2. Quantity.3. Quality.4. Relation.5. Location.6. Time.7. Position.8. Possession. 9. Doing.10.Undergoing.

Page 36: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

SNOMED-CT Top Level Substance Body Structure Specimen Context-Dependent

Categories* Attribute Finding* Staging and Scales Organism Physical Object

Events Environments and

Geographic Locations Qualifier Value Special Concept* Pharmaceutical and

Biological Products Social Context Disease Procedure Physical Force

Page 37: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Examples of Rules

Don’t confuse instances with universals Your navel (instance) is not the abstract representation of all navels

Your microarray result is not the abstract representation of all microarray results

The meaning of an ontology should not change when the programming language changes

Page 38: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

First Rule: Univocity

Terms (including those describing relations) should have the same meanings on every occasion of use.

In other words, they should refer to the same kinds of instances in reality

Page 39: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Example of univocity problem in case of part_of relation

(Old) Gene Ontology: ‘part_of’ = ‘may be part of’

flagellum part_of cell ‘part_of’ = ‘is at times part of’

replication fork part_of the nucleoplasm

‘part_of’ = ‘is included as a sub-list in’

Page 40: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Second Rule: Positivity

Complements of classes are not themselves classes.

Terms such as ‘non-mammal’, or ‘non-frog’, or ‘non-membrane’ do not designate genuine classes.

Page 41: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Third Rule: Objectivity

Which classes exist is not a function of our biological knowledge.

Terms such as ‘unknown’ or ‘unclassified’ do not designate biological natural kinds.

Page 42: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Fourth Rule: Single Inheritance

No class in a classificatory hierarchy should have more than one is_a parent on the immediate higher level

I.e. no diamonds

Cis_a2

Bis_a1

A

Page 43: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Following the single inheritance rule

The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it.

The entire information content of the term hierarchy can be translated very cleanly into a computer representation

Page 44: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

B C

is_a1 is_a2

A

‘is_a’ no longer univocal

Problems with multiple inheritance

Page 45: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Fifth Rule: Clarity of Text Definitions

The terms used in a definition should be simpler (more intelligible) than the term to be defined

otherwise the definition provides no assistance to human understanding

Machines can cope with the full formal representation (it doesn’t need the text)

Page 46: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Sixth Rule: Basis in Reality

When building or maintaining an ontology, always think carefully about how classes (types, kinds, species) relate to instances in reality

Axioms governing instances Every class has at least one instance (exceptions will occur at top levels)

Each child class has a smaller collection of instances than its parent class

Page 47: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Axiom: Every parent class has at least two children

Page 48: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The reason that rules are important:

Interoperability Ontologies should work together Avoid redundancy in ontology building

Support reuse Ontologies should be capable of being used by other ontologies (cumulation)

Page 49: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The problem of ontology re-use

SNOMEDMeSHUMLSNCIT HL7-RIM …

None of these have clearly defined relations

Still remain too much at the level of TERMINOLOGY

Not based on a common set of rules

Not based on a common set of relations

Page 50: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

An example of unclear relationship use

A is_a B ‘A’ is more specific in meaning than ‘B’

HL7-RIM: Individual Allele is_a Act of Observation

cancer documentation is_a cancer disease prevention is_a disease

Page 51: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

How to define A is_a B

A is_a B = def.

• A and B are names of universals (natural kinds, types) in reality

• all instances of A are as a matter of biological science also instances of B

Page 52: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Benefits of well-defined relationships

If the relations in an ontology are well-defined, then reasoning can cascade from one relational assertion (A R1 B) to the next (B R2 C). Relations used in ontologies thus far have not been well defined in this sense.

Find all DNA binding proteins should also find all transcription factor proteins because Transcription factor is_a DNA binding protein

Page 53: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Biomedical data integration /

interoperability Will never be achieved through integration of meanings or concepts

The problem: different user communities use different concepts

What is really needed is a well-defined, commonly used set of relationships

Page 54: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Seventh Rule: Distinguish Universals

and Instances A good ontology must distinguish clearly between universals (types, kinds, classes)

and instances (tokens, individuals, particulars)

Page 55: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Why distinguish classes from instances?

What holds on the level of instances may not hold on the level of universals

For example, my definition of an “adjacent_to” relation requires that it work in either direction

(This particular) nucleus adjacent_to (this particular) cytoplasm Always true

Cytoplasm adjacent_to nucleus Not always true

Page 56: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Using relations

Between classes: is_a, part_of, ...

Between an instance and a class: this explosion instance_of the class explosion

Between instances: Mary’s heart part_of Mary

Relations must be defined to always work

Page 57: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Defining the part_of relation can be a

problem part_of as a relation between classes versus part_of as a relation between instances

nucleus part_of cell (classes) your heart part_of you (instances)

testis part_of human being ? heart part_of human being ? human being has_part human testis ?

Page 58: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Similar considerations are required to clearly define

nearly all relations A causes B A is_located in B A is_adjacent_to B A derives_from B

Zygote derives_from ovum, sperm

A transformation_of B Adult transformation_of child

Page 59: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The Rules1. Univocity: Terms should have the same

meanings on every occasion of use2. Positivity: Terms such as ‘non-mammal’ or

‘non-membrane’ do not designate genuine classes.

3. Objectivity: Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.

4. Single Inheritance: No class in a classification hierarchy should have more than one is_a parent on the immediate higher level

5. Intelligibility of Definitions: The terms used in a definition should be simpler (more intelligible) than the term to be defined

6. Basis in Reality: When building or maintaining an ontology, always think carefully at how classes relate to instances in reality

7. Distinguish Classes and Instances

Page 60: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Some rules are Rules of Thumb

The world is full of difficult trade-offs

The benefits of formal (logical and ontological) rigor need to be balanced Against the constraints of computer tractability,

Against the needs of biomedical practitioners.

BUT do the very best you can!

Page 61: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

3. Case Studies from the GO

http://www.geneontology.org

Page 62: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

How has GO dealt with some specific aspects of ontology

development? Univocity Positivity Objectivity Definitions

Formal definitions Written definitions

Ontology Re-use (Alignment)

Page 63: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Tactile senseTactionTactition

?

The Challenge of Univocity:People call the same thing by

different names

Page 64: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Tactile senseTactionTactition

perception of touch ; GO:0050975

Univocity: GO uses one term and many characterized

synonyms

Page 65: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

= bud initiation

= bud initiation

= bud initiation

The Challenge of Univocity: People use the same words to describe different things

Page 66: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Bud initiation? How is a computer to know?

Page 67: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

= bud initiation

sensu Metazoa

= bud initiation

sensu Saccharomyces

= bud initiation

sensu Viridiplantae

Univocity: GO adds “sensu” descriptors to discriminate

among organisms

Page 68: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The Challenge of Positivity

Some organelles are membrane-bound.A centrosome is not a membrane bound organelle,but it still may be considered an organelle.

Page 69: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The Challenge of Positivity: Sometimes absence is a

distinction in a Biologist’s mind

non-membrane-bound organelle

GO:0043228 membrane-bound organelle

GO:0043227

Page 70: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Positivity

Note the logical difference between “non-membrane-bound organelle” and “not a membrane-bound organelle”

The latter includes everything that is not a membrane bound organelle!

Page 71: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The Challenge of Objectivity: Database users want to know if

we don’t know anything (Exhaustiveness with respect

to knowledge)

We don’t know anything about a gene product with

respect to these

We don’t know anything about the ligand that

binds this type of GPCR

Page 72: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Objectivity

How can we use GO to annotate gene products when we know that we don’t have any information about them? Currently GO has terms in each ontology to describe unknown

An alternative might be to annotate genes to root nodes and use an evidence code to describe that we have no data.

Similar strategies could be used for things like receptors where the ligand is unknown.

Page 73: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

GPCRs with unknown ligands

We could annotate to

this

Page 74: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

GO DefinitionsA definition written by

a biologist:necessary & sufficient

conditions written definition(not computable)

Graph structure: necessary conditions

formal(computable)

Page 75: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Relationships and definitions

Important considerations: Placement in the graph- selecting parents

Appropriate relationships to different parents

True path violation

Page 76: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

True path violationWhat is it?

..”the path from a child term all the way up to its top-level parent(s) must always be true".

chromosome

Mitochondrial chromosome

Is_a relationship

Part_of relationship

nucleus

Page 77: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

True path violationWhat is it?

nucleus chromosome

Nuclear chromosome

Mitochondrial chromosome

Is_a relationshipsPart_of relationship

Page 78: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The Importance of synonyms:is tRNA a function?

Molecular_function

Triplet codon amino acid adaptor activity

GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis.

Synonym: tRNA

Page 79: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Ontology integrationOne of the current goals of GO

is integration

cone cell fate commitment

retinal_cone_cell keratinocyte differentiation keratinocyte adipocyte differentiation fat_cell

dendritic cell activation dendritic_cell

lymphocyte proliferation lymphocyte

T-cell homeostasis T_lymphocyte

garland cell differentiation garland_cell

heterocyst cell differentiation heterocyst

References to Cell Types in GO

Cell Types in the Cell Ontology

with

Page 80: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

We can integrate the GO with other ontologies

Chemical ontologies 3,4-dihydroxy-2-butanone-4-phosphate synthase activity

Anatomy ontologies metanephros development

GO itself mitochondrial inner membrane peptidase activity

Nota bene: some time and effort will be required

Page 81: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Building Ontology

Improve

Collaborate and Learn

Page 82: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Applied Ontology: a summary

Dedicated editors Practice good ontological hygiene Engage the community Reward compliance and get the ontology into use

Plan for change over time KISS: Concentrate on what you can definitely agree upon: the steps you can take with certainty.

Page 83: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

4. Case Studies for group discussion

Page 84: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

mitosis and meiosis It's been a full lunar cycle since we last talked about this

on the mailing list, and I would like to draw everyone's attention once again to the exciting topics of chromosome segregation, nuclear division and cell division. The basic problem is the multiplicity of meanings attached to 'mitosis'. The word are used in the literature and colloquially to represent everything from chromosome segregation up to a full round of nuclear and cell division and there is no consensus on how to define it in scientific or general dictionaries (check www.onelook.com for proof). To compound the problem, the only process common to all species which undergo 'mitosis' is chromosome segregation; not all species undergo nuclear division or cell division during the processes described in the literature as 'mitosis'. In the ontologies, we currently have 'mitosis' defined as chromosome segregation and nuclear division. This is therefore wrong for those species in which there is no nuclear division accompanying chromosome segregation. How are we going to define mitosis?

Page 85: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Events of the mitotic cell cycle that need to be represented: mitotic chromosome segregation mitotic nuclear division mitotic cell division

Only component common to all these is mitotic chromosome segregation.

Structure must be flexible enough to accommodate any of the flavors of 'mitosis’, no matter what the species and no matter whether the annotator has read the definition or not.

Page 86: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor
Page 87: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Backing up assertions

QUESTION: What evidence code is appropriate to use for statements of “common knowledge”?

Page 88: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

The current documentation states that TAS may be used as the evidence code for statements of common knowledge. For example, let’s say you have a paper that says that Protein X is an xxxxx , with a direct assay for activity, so you can use IDA for this function term. Then it also makes a mutation in the gene for Protein X and shows that it is involved in process yyyy, so you can use IMP for the process term. But, the paper does not have any direct evidence about the localization of Protein X. However, everyone knows that process yyyy occurs in the cytoplasm, so you can annotate protein X to the component term “cytoplasm ; GO:5737” by TAS using a general reference like Biochemistry by Lupert Stryer.

Page 89: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

There is not really a traceable statement in Stryer providing evidence that process yyyy occurs in this location in yeast.

SGD feels that it is better to use the newer evidence code IC for these “common” knowledge types of annotations. Thus, if an SGD curator felt that it was reasonable to make the annotation “cytoplasm” based on the knowledge that Protein X the process annotation yyyy, then the curator could assign the component term “cytoplasm ; GO:5737” using IC and the GOid of the process term yyyyy.

Page 90: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

many of these “common knowledge” types of statements are often not well based in actual experiments conducted on the organism of interest, that early biochemists would often perform experiments with materials that were easy to obtain, e.g. calf thymus, and assume that this accurately represented the situation for another organism, e.g. human. This may or may not be the case.

Page 91: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

What is the most appropriate GO term for annotating a

response to methylmercury? "Response to mercury ion" doesn't seem quite right, as it specifically states that the response is "as a result of exposure to mercuric ions (Hg2+)", but the more general-sounding "response to mercury" is a synonym of it. In the publication I am working on, they exposed zebrafish to methylmercury and documented the resulting changes in gene expression.

Page 92: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

"Response to mercury ion

Definition: A change in state or activity of the organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of exposure to mercuric ions (Hg2+).

Synonyms: response to mercuric, response to mercury

Page 94: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Bloggers and other online groups (eg. del.icio.us, Flickr [online photo archive], Technorati) have been self-categorizing or 'tagging' web sites and their content using user-defined words and phrases and not an expertly curated vocabulary or ontology. The end result is that a vast amount of content has been indexed using a rich vocabulary of tags (to date, technorati has over 1.2 billion links tagged with 1.2 million tags).

Whilst this certainly lacks the formal consistency that would be obtained with curated annotation against a standard vocabulary, the quantity of content being categorized far exceeds what could be done by a group of annotators and perhaps is richer because the tags are defined by the users and creators of that content, not by a third party interpreting the material after the fact.

Given the ever increasing quantity of scientific data, the proliferation of online publishing, etc., could scientists tagging their own data with their own terms be the way to go?

Page 95: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

How can you recruit and train people, in both logic and biology, given that without a sufficient number of competent personnel the ontology cannot be maintained?

Page 96: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Thanks to NIH and HHMI for funding and

supportAnd to my fantastic

colleagues (whose slides these are)

MICHAEL ASHBURNTER, BARRY SMITH, DAVID HILL,

CORNELIUS ROSSE & CHRIS MUNGALL

Page 97: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

P.S. Graphical User Interfaces

Semantics

Page 98: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Common pitfalls

Don’t confuse instances with artifacts of your database representation...

Page 99: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

part_of

part_of must be time-indexed for spatial classes

A part_of B is defined as: Given any instance a and any time t, If a is an instance of the universal A at t,

then there is some instance b of the universal B

such that a is an instance-level part_of b at t

Page 100: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

C

c at t

C1

c1 at t1

C'

c' at t

time

instances

derives_from

derives_fromovumsperm

zygote

Page 101: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

c at t1

C

c at t

C1

time

same instance

transformation_of

pre-RNA mature RNA

adultchild

C2 transformation_of C1 is defined as Given any instance c of C2

c was at some earlier time an instance of C1

Page 102: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

embryological development C

c at t c at t1

C1

Page 103: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

C

c at t c at t1

C1

tumor development

Page 104: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Key

In the following discussion: Classes are in upper case

‘A’ is the class Instances are in lower case

‘a’ is a particular instance

Page 105: Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Placement in the graph

Example- Proteasome complex


Top Related