an introduction to ontologies contributors: melissa haendel, chris mungall, david osumi-sutherland

45
An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherlan

Upload: leona-knight

Post on 26-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

An Introduction to Ontologies

Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Page 2: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Setting the stage

1. What is an ontology?2. Understanding ontologies3. Anatomy of an ontology class4. Upper ontologies5. How are ontologies used?

Page 3: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

MouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

Common controlled vocabularies indicate the same meaning under different annotation circumstances

Page 4: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Any closed, prescribed list of terms used for classifying data

What is a controlled vocabulary?

Key Features: Terms are not usually defined. Relationships between the terms are

not usually defined. Can be a list.

Here is a CV of wines:Pinot noir, red, chardonnay, Chianti, Bordeaux, Riesling….These are all different types- color, location, varietal, and are present in a list.

Another example would be the map locations list at the end of your Gazeteer.

Page 5: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Any controlled vocabulary that is arranged in a hierarchy

What is a Taxonomy?

Key Features:• Terms are not usually defined.• Relationships between the terms are not usually defined.• Terms are arranged in a hierarchy.

Here is a wine taxonomy:

WineRed

merlotzinfandelcabernetpinot noir

Whitechardonnaypinot grisRiesling

Page 6: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

A taxonomy that contains additional information about use of the terms

What is a Thesaurus?

Key Features:• Terms are not usually defined.• Relationships between the terms are not usually defined.• Terms are arranged in a hierarchy.• Statements about the terms are included such as scope notes or instructions for use.

Some well known thesauri are:WordNet, NCI cancer thesaurus, MeSH

Page 7: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

A formal conceptualization of a specified domain of interestWhat is an ontology?

Key Features:• Terms are defined.• Relationships between the terms are defined, allowing logical inference.• Terms are arranged in a hierarchy.• Expressed in a knowledge representation language such as RDFS, OBO, or OWL.

Some well known ontologies are:SnoMED, Foundational Model of Anatomy, Gene Ontology, Linnean Taxonomy of species

Reproduced with permission, Jason Freenyhttp://web.mac.com/moistproduction/flash/index.html

Page 8: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

http://www.mkbergman.com/?m=20070516

The Ontology spectrum:

Bottom line: you get what you pay for.

OBO

Page 9: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Ontology Languages

Web Ontology Language (OWL)– Standard set of logical constructs for building an

ontology– Many syntaxes

• OWL-RDF/XML• OWL-XML• Manchester

– Many reasoners

OBO-Format– Current formalized by mapping to a subset of OWL

• can be treated as another OWL syntax

Page 10: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

A common misconceptionAre ontologies about terms or things?

When you are arguing about including something in your ontology,

1. Are you arguing about what a term means?

2. Or are you arguing about what term should be adopted in your ontology language to represent a well-characterized entity or concept?

These are terminological questions, and not ontological questions. 1 is a purely linguistic dispute; 2 is primarily a practical question.

The ontological questions are:

What kind of things should we recognize in our ontology?(Never mind, for the moment, what we might choose to call them.)

What are their relations to one another?(Note: What are the relations of their terms/names to one another?)

Adapted from Gary Merrill

Page 11: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

What matters are things

Data annotated to the right schema will be more consistent

DessertIce cream

gelatosno-conesoft-servecustard cream

Caketortedouble-layergaletteangel food cake

DES:000038DES:000034

DES:000456DES:005566DES:000564DES:000744

DES:0003221DES:000667DES:000975DES:000544DES:000765

DES:000034 - Frozen milk and/or cream with various sugars, and flavoring such as fresh fruit and nut purees.[1]

Labels are handles, not things

Page 12: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Why build an ontology? A simple example

Number of genes annotated to each of the following brain parts in an ontology:

brain 20part_of hindbrain 15part_of rhombomere 10

Query brain without ontology 20Query brain with ontology 45

Ontologies can facilitate grouping and retrieval of data

Page 13: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

A

C

B

D EVertebrata

Ascidians

Arthropoda

Annelida

Mollusca

Echinodermata

tetrapod limbs

ampullae

tube feet

parapodia

Querying for genes in similar structures across species

Panganiban et al., PNAS, 1997

Distal-less orthologs participate in distal-proximal pattern formation and appendage morphogenesis

mouse limbsea urchin tube feet

ascidian ampulla polychaete parapodia

Page 14: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

is_a

entity

organism

cat

mammal

animalis_a

is_a

is_ahuman

is_a

instance_of instance_of

Peanut Chris Shaffer

Types, subtypes, and instances

Subtypingrelation

is_a = SubClassOf

Page 15: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

How do you tell if it is an instance or a class?Is there more than one in existence? Is the entity referencing a group of things with common properties?

Class or instance?There is only one Snoopy

There is a class of things labeled “Snoopy toys”

Class or instance? Class or instance?

There is only one Alaska

There is a class of things labeled “States”

There is only one blue morpho in my specimen collection

There is a class of things labeled “Blue morpho butterflies”

Page 16: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

General Principle for Logical DefinitionsDefinitions are of the following Genus-Differentia form:

X = a Y which has one or more differentiating characteristics.

where X is the is_a parent of Y.

Definition: Blue cylinder = Cylinder that has color blue.

Definition: cylinder = Surface formed by the set of lines perpendicular to a plane, which pass through a given circle in that plane. is_a is_a

Definition: Red cylinder = Cylinder that has color red.

Page 17: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

The True Path Rule

cuticle synthesis--[i] chitin metabolismcell wall biosynthesis--[i] chitin metabolism----[i] chitin biosynthesis----[i] chitin catabolism

chitin metabolism--[i] chitin biosynthesis--[i] chitin catabolism--[i] cuticle chitin metabolism----[i] cuticle chitin biosynthesis----[i] cuticle chitin catabolism--[i] cell wall chitin metabolism----[i] cell wall chitin biosynthesis----[i] cell wall chitin catabolism

GO Before: GO After:

BUT: A fly chitin synthase gene could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls.

NOW: all the subClass terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true.

The pathway from a subClass all the way up to its top level parent(s) must be universally true.

Page 18: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Where does the True Path Rule come from?Transitivity. Some relations are transitive, and apply across all levels of the hierarchy.

For example, a cat is_a mammal, and a mammal is_a vertebrateSOa cat is_a vertebrate

=> This is the true path rule and is because the is_a relation is transitive.

Some properties are not transitive.

For example, head has_quality round.and,

head part_of organism. So is the organism round? Of course not!

BUT, eyes are part_of head, and head part_of organism, SO eye part_of organism is true, because part_of is a tranistive relation.

Relations are logically defined in a common relation ontology or within each ontology that uses them.

≠>

Page 19: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Relationships and definitions

A relationship from one class to another is a formalized part of its definition (an object property in OWL)

A subtype relation (is_a in OBO, SubClassOf in OWL) specifies necessary conditions for membership of a class.

For example, finger part_of hand (all finger part_of some hand) states that a necessary condition of being in the class finger is to be part of some hand.

So… if a finger exists, it is part of some hand. But…this does not mean that if a hand exists, it has as a part a finger.

Page 20: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

DirectionalityOften, the order of the terms in an assertion will matter:We can assert:

adult transformation_of child

but notchild transformation_of adult

More about using ontology classes and relations

Universality Entities should have the same meanings on every occasion of use. (= They should refer to the same universal types)

Basic ontological relations such as SubClassOf and part_of should be used in the same way by all ontologiesFor example, if you need a non-transitive part relation, then define a new relation for this purpose.

Page 21: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Meissirel C et al. PNAS 1997;94:5900-5905

©1997 by National Academy of Sciences

Symmetric

For example:

Retinal ganglion cell connected_to lateral geniculate nucleus

AND

Lateral geniculate nucleus connected_to Retinal ganglion cell

Some relations are Symmetric.

If A is connected_to B, then B is connected_to A.

Page 22: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

muscle

anatomical structure

lumen

anatomical space

lumen of gut

anatomical space

anatomical structure

Some classes are declared to never share any instances in common

OWL DisjointWith OBO: disjoint_from

NO!

Page 23: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

About reasonersA piece of software able to infer logical consequences from a set of asserted facts or axioms.

They are used to check the logical consistency of the ontologies and to extend the ontologies with "inferred" facts or axioms

For example, a reasoner would infer:

Major premise: All mortals die.Minor premise: Some men are mortals.Conclusion: Some men die.

Different reasoners can perform slightly differently. There are a number of reasoners to be aware of:ELK, HerMIT, Pellet, etc. We’ll use these later in the protégé tutorial.

Page 24: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Different kinds of definitions An ontology is a collection of axioms

An axiom is simply a sentence or a statement Axioms can be

non-logical (aka “annotations” or “text definitions”)

• E.G. GO_0000262 has synonym ‘mtDNA’

• opaque to reasoners logical

• well-defined semantics

• understood by reasoners

• Example: SubClassOf axioms

• Arguments can be classes or class expressions (for example, the class of things that are parts of the leg)

Page 25: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Anatomy ontologies: Exemplar use case

1. What makes anatomy ontologies different?

2. What kinds of anatomy ontologies exist?

3. Anatomy of an anatomy ontology class4. Upper ontologies5. How are anatomy ontologies used?

Page 26: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

There are many useful ways to classify parts of organisms:

its parts and their arrangement its relation to other structures

what is it: part of; connected to; adjacent to, overlapping?

its shape its function its developmental origins its species or clade its evolutionary history

Cajal 1915, “Accept the view that nothing in nature is useless, even from the human point of view.”

Page 27: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Not all classification is useful

Be practical: Build ontologies for what you need and for what can be reused

About thirty years ago there was much talk that geologists ought only to observe and not theorise; and I well remember some one saying that at this rate a man might as well go into a gravel-pit and count the pebbles and describe the colours.

C. Darwin

Page 28: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Classifying anatomy

appendage

antenna forewing

wing

hindwing

Page 29: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Relationships record classifications too

leg

part_of some ‘thoracic segment

wing

‘leg’ SubClassOf part_of some thoracic segment

Page 30: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

The knowledge in an ontology can make the reasons for classification explicit

Any sense organ that functions in the detection of smell is an olfactory sense organ

sense organcapable_of some detection of smell

olfactory sense organ

Page 31: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

nose

sense organ

nose

capable_of some detection of smell

Classifying

sense organcapable_of some detection of smell

olfactory sense organ

nose

=> These are necessary and sufficient conditions, also called an equivalent class axiom

Page 32: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

It is difficult to keep track of multiple classification chains and: • ensure completeness• avoid redundancy• avoid true path rule violation

Why create class equivalent classes and class restrictions?

Page 33: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Compositionality and avoiding asserted multiple inheritance

We can logically define composed classes and create complex definitions from simpler ones

aka: building blocks, cross-products, logical definitionsDescriptions can be composed at any time

Ontology construction time (pre-composition) Annotation time (post-composition)

Formal necessary and sufficient definitions + a reasoner

Automatic (and therefore manageable) classification Requires subtype classification, so apart from the root

term(s), no term should lack a superclass parent.

Let the reasoner do the work!

Page 34: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

post-composed versus pre-composed anatomical entities are logically equivalent

Plasma membrane of spermatocyte• Plasma membrane [GO CC]• Spermatocyte [Cell Ontology]

plasma membrane and part_of some spermatocyte

Gene Ontology Relation Ontology Cell Ontology

Genus Differentia

Page 35: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

What kinds of anatomy ontologies exist?Mouse

MA (adult) EMAP / EMAPA (embryonic)

Human FMA (adult) EHDAA2 (CS1-CS20)

Amphibian AAO XAO

Fish ZFA (zebrafish) MFO (medaka) TAO (teleosts)

Nematode WBbt (c elegans)

Arthropod FBbt (Drosophila) TGMA (Mosquito) HAO (hymenoptera) Arthropod anatomy ontology

Plant ontology

Species-centric and multi-species ontologies

Species neutral ontologies

CARO (common anatomy reference ontology)

Uberon (cross-species anatomy)vHOG (vertebrate homologous organs)

CL (cell ontology)

GO (gene ontology)

Phenotype ontologies

MP mammalian phenotype

HP human phenotype

WB worm phenotype

Page 36: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

chemical entities

Many perspectives, many ontologies – that overlap in content

grossanatomy

tissues

cellscellanatomy

proteins

phenotypes

clinical disorders

processes

physiological processes

development

reactions

cellular processes

behavior

evolutionary characters

nervous systemneural crest

Page 37: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Domain ontologies are organized according to upper ontologies (Basic Formal Ontology) that specify the general types of things that exist

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic Quality(PaTO)

Biological Process(GO)

CELL AND CELLULAR COMPONENT

Cell(CL)

Cellular Component(FMA, GO)

Cellular Function(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process(GO)

=> Classification according to these higher level types helps ensure the True Path Rule holds

Page 38: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

BFO high level classes

Continuant: An entity that exists in full at any time in which it exists at all, persists through time while maintaining its identity and has no temporal parts.

Independent continuant: A continuant that is a bearer of quality and realizable entities, in which other entities inhere and which itself cannot inhere in anything.

Dependent continuant: A continuant that is either dependent on one or other independent continuant bearers or inheres in or is borne by other entities.

Occurrent: An entity that has temporal parts and that happens, unfolds or develops through time. Sometimes also called perdurants.

Page 39: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

The Common Anatomy Reference OntologyCARO is a structural classification based on

granularity

From the bottom up:Cell componentCellPortion of tissueMulti-tissue structure

From the top down:Organism subdivisionAnatomical system

Acellular structures

CARO is an upper reference ontology that can be used to structure new anatomy ontologies

Page 40: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Using CARO as a templateis_a

zebrafish AO

CARO

Cell and cell component are cross-referenced to GO

Zebrafish classes are asserted to be subclasses of CARO classes

Page 41: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Example of complexity arising from multiple species-contexts

erythrocyte

cell

nucleate cell enucleate cell

not applicable in all contexts

Page 42: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Example of complexity arising from multiple species-contexts

erythrocyte

nucleate erythrocyte

enucleate erythrocyte

cell

nucleate cell enucleate cell

zebrafish nucleate

erythrocyte

human erythrocyteZFA:0009256

… …

CL:0000562

CL:0000232

CL:0000592

FMA:81100

species ontologiesattached at appropriatelevel

Page 43: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Developmental Biology, Scott Gilbert, 6th ed.

Using reasoners to detect errors

Fruit fly FBbt ‘tibia’ Human FMA ‘tibia’

UBERON: tibia

UBERON: bone

is_a

is_a

is_a

Vertebrata

Drosophila melanogaster

part_of

Homo sapiens

is_a

only_in_taxon

part_of

disjoint_with

Page 44: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Representing different levels of granularity

lateral line development

?

?

GO:cilium part_of CL:hair cell part_of CL:neuromast

CL:hair cell part_of CL:neuromast

CL:neuromast part_of ZFA:lateral line

GO

cilium development

hair cell development

neuromast development

Page 45: An Introduction to Ontologies Contributors: Melissa Haendel, Chris Mungall, David Osumi-Sutherland

Ontology considerations An ontology is a classification There are many useful ways to classify anatomy Maintaining multiple classification schemes by hand

is impracticalSo you should automate it.

Everybody makes mistakesSo you should let the computer find errors for you

Re-use other people’s work where possibleImport class hierarchiesUse common patterns

Cautionary note – formal languages have limitations. Don’t expect to be able to express everything!