1 how to build an ontology barry smith

Post on 14-Dec-2015

222 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

How to Build an Ontology

Barry Smithhttp://ontology.buffalo.edu/smith

2

Ontology

A classification of entities and the relations

between them. Ontology is a list of types structured by relationsDefined by a scientific field's vocabulary and by the

canonical formulations of its theories. Scientific theories consist of generalizations.

What I will not be talking about: XML, OWL, ..., data(types), information models, file formats ...

3

Top-Level

GOOBO, OBO Core

NCBOFMA

NCBC Roadmap CentersNCI EVS

NECTAR (National Electronic Clinical Trials and Research) Network

4

Instances are not included in an ontology

It is the generalizations that are important

(but instances must still be taken into account)

5

A 515287 DC3300 Dust Collector Fan

B 521683 Gilmer Belt

C 521682 Motor Drive Belt

6

Ontology Types Instances

7

Ontology = A Representation of Types

8

Ontology = A Representation of Types

Each node of an ontology consists of:

• preferred term (aka term)

• term identifier (TUI, aka CUI)

• synonyms

• definition, glosses, comments

9

Ontology = A Representation of Types

Nodes in an ontology are connected by relations:

primarily: is_a (= is subtype of) and part_of

designed to support search, reasoning and annotation

10

Rules for formating terms

• Terms are names of types: if you prefix a term with

the type ___the term should still make sense

• Hence: terms should be in the singular• Terms should be lower case• Avoid abbreviations even when it is clear in

context what they mean (‘breast’ for ‘breast tumor’)

11

Motivation: to capture reality

Inferences and decisions we make are based upon what we know of reality.

An ontology is a computable representation of this underlying bio(techno)logical reality.

Enables a computer to reason over the data in (some of) the ways that we do.

12

Biomedical ontology integration / interoperability

Will never be achieved through integration of meanings or concepts

The problem is precisely that different user communities use different concepts

What’s really needed is to have well-defined commonly used relationships

13

Concepts

Biomedical ontology integration will never be achieved through integration of meanings or concepts

The problem is precisely that different user communities use different concepts

14

Concepts

Concepts are in your head and will change as our understanding changes

Ontologies represent types: not concepts, meanings, ideas ...

Types exist, with their instances, in objective reality

– including types of experimental process, design, method, ...

15

Most ontologies are execrableBut some good ontologies do

already exist

• as far as possible don’t reinvent

• use the power of combination and collaboration

• ontologies are like telephones: they are valuable only to the degree that they are used and networked with other ontologies

16

Why do we need rules/standards for good ontology?

Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking): unintuitive rules for classification lead to errors

Intuitive rule facilitate training of curators and annotators

Common rules allow alignment with other ontologies

Logically coherent rules enhance harvesting of content through automatic reasoning systems

17

Rules on types

Don’t confuse types with conceptsDon’t confuse types with ways of getting to

know typesDon’t confuse types with ways of talking

about typesDon’t confuses types with data about types

18

First Rule: Univocity

Terms (including those describing relations) should have the same meanings on every occasion of use.

In other words, they should refer to the same types in reality

19

Second Rule: Positivity

There are no negative types

Terms such as ‘non-mammal’ or ‘non-membrane’ do not designate genuine types.

(There are also no conjunctive and disjunctive types: rabbit and nailfile; rabbit or nosewipe)

20

Third Rule: Objectivity

Which types exist is not a function of our biological knowledge.

Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.

21

Fourth Rule: Single Inheritance

No type in a classificatory hierarchy should have more than one is_a parent on the immediate higher level

22

Rule of Single Inheritance

no diamonds:

C

is_a2

B

is_a1

A

23

Problems with multiple inheritance

B C

is_a1 is_a2

A

‘is_a’ no longer univocal

24

‘is_a’ is pressed into service to mean a variety of different things

shortfalls from single inheritance are often clues to incorrect entry of terms and relations

the resulting ambiguities make the rules for correct entry difficult to communicate to human curators

25

is_a Overloading

serves as obstacle to integration with neighboring ontologies

The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned.

26

To the degree that the above rules are not satisfied, error

checking and ontology alignment will be achievable,

at best, only with human intervention and via force

majeure

27

Current Best Practice:The Foundational Model of Anatomy

28

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

29

Current Best Practice:The Foundational Model of Anatomy

Follows formal rules for definitions laid down by Aristotle.

When A is_a B, the definition of ‘A’ takes the form:

an A =def. a B which ...

a human being =def. an animal which is rational

30

FMA Example

Cell def an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus

Plasma membrane =def a cell part that surrounds the cytoplasm

31

The FMA regimentation

Brings the advantage that each definition reflects the position in the hierarchy to which a defined term belongs.

The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it.

The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation

32

GO now adopting structured definitions contain both genus and differentiae

Essence = Genus + Differentiae

neuron cell differentiation =Genus: differentiation (processes whereby a relativelyunspecialized cell acquires the specialized features of..)Differentiae: acquires features of a neuron

33

Ontology alignmentOne of the current goals of GO is to align:

cone cell fate commitment retinal_cone_cell

keratinocyte differentiation keratinocyte

adipocyte differentiation fat_cell

dendritic cell activation dendritic_cell

lymphocyte proliferation lymphocyte

T-cell homeostasis T_lymphocyte

garland cell differentiation garland_cell

heterocyst cell differentiation heterocyst

Cell Types in GO Cell Types in the Cell Ontologywith

34

Alignment of the two ontologies will permit the generation of consistent and complete definitions

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

35

Other Ontologies to be aligned with GO

Chemical ontologies– 3,4-dihydroxy-2-butanone-4-phosphate synthase

activity

Anatomy ontologies– metanephros development

GO itself– mitochondrial inner membrane peptidase activity

OBO core

36

eventually to comprehend all of OBO

37

Top Level OBO-UBO

continuants: objects, characteristics, spatial regions

occurrents: processes, temporal regions, spatio-temporal regions

38

Definitions should be intelligible to both machines and humans

Machines can cope with the full formal representation

Humans need modularity

39

Fifth Rule:Terms and relations should have

clear definitions

These tell us how the ontology relates to the world of biological instances, meaning the actual particulars in reality: – actual cells, actual portions of cytoplasm, and

so on

40

But

Some terms are primitive (cannot be defined)

AVOID CIRCULAR DEFINITIONS !Avoid definitions of the forms:

An A is an A which is B (person = person with identity documents)

An A is the B of an A (heptolysis = the causes of heptolysis)

41

siamese

mammal

cat

organism

substancetypes

animal

instances

frogleaf type

42

Benefits of well-defined relationships

If the relations in an ontology are well-defined, then reasoning can cascade from one relational assertion (A R1 B) to the next (B R2 C).

Find all DNA binding proteins should also find all transcription factor proteins becausetranscription factor is_a DNA binding

protein

43

What happens when an ontology has no clear definition of A is_a B:

cancer documentation is_a cancer

disease prevention is_a disease

living subject is_a information object representing an animal or complex organism

individual allele is_a act of observation

44

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

45

How to define A is_a B

A is_a B =def.

all instances of A are as a matter of biological science also instances of B

here A and B are names of types in reality

46

How to define A is_a B

A is_a B =def.

for all a if a instance_of A, then a instance_of B

47

Kinds of relations

Between types:– is_a, part_of, ...

Between an instance and a type– this explosion instance_of the type explosion

Between instances:– Mary’s heart part_of Mary

48

Part_of as a relation between types is more problematic than

is standardly supposedheart part_of human being ?

human heart part_of human being ?

human being has_part human testis ?

testis part_of human being ?

49

Definition of part_of as a relation between types

A part_of B =Def all instances of A are instance-level parts of some instance of B

human testis part_of adult human being

50

Instance level

this nucleus is adjacent to this cytoplasm

implies:

this cytoplasm is adjacent to this nucleus

Type level

nucleus adjacent_to cytoplasm

Not: cytoplasm adjacent_to nucleus

seminal vesicle adjacent_to urinary bladder

Not: urinary bladder adjacent_to seminal vesicle

51

Definitions of the all-some form

allow cascading inferences

If A R1 B and B R2 C, then we know that

every A stands in R1 to some B, but we know also that, whichever B this is, it can be plugged into the R2 relation

52

c at t1

C

c at t

C1

time

same instance

transformation_of

pre-RNA mature RNA

adultchild

53

transformation_of

A transformation_of B =Def. Every instance of A was at some earlier time an

instance of B

adult transformation_of child

54

embryological development C

c at t c at t1

C1

55

C

c at t c at t1

C1

tumor development

56

C

c at t

C1

c1 at t1

C'

c' at t

time

instances

zygote derives_fromovumsperm

derives_from

57

One main obstacle to integrating biological and experiment-

generated data

Most ontologies have no facility for dealing with time and instances

58EXPO: Experiment Ontology

59

representational style part_of experimental hypothesisexperimental actions part_of experimental design

60tool part_of experimental design

(confuses object with specification)

61

hypothesis driven is_a Galilean

62

physical is_a scientific experiment(avoid abbreviations)

63

admin info about experiment is_a scientific experiment

64

where is the top level? objects, processes, characteristics

65

is_a and part_of never cross categorial divides

(cf. tripartite organization of GO)

if A is_a B

then A is an object type iff B is an object type

then A is a process type iff B is a process type

then A is a characteristic type iff B is a characteristic type

66

Some thoughts on time

continuants vs. occurrentsobjects, characteristics vs. processes

timetimeline

daydaytime

menstrual cyclehigh tide

67

What is time?

68

Top Level OBO-UBO

continuants: objects, characteristics, spatial regions

occurrents: processes, temporal regions, spatio-temporal regions

Space = the largest spatial region

Time = the largest temporal region

69

Relative time, subjective time

terms describing (regions of) time in special (qualitative, perspective-dependent, landmark dependent) ways

tomorrow, yesterday

uptown, downtown

phase A trial

Wednesday

70

Characteristics are continuants

many characteristics have realizations, applications or executions, which are processes

plandesignmethodmenstrual cyclefunction

71

GlaxoSmithKline*

What we need is “industrial-strength” ontologies with a consistent and rich representation formalism that are amenable for use as an integration framework, and support reasoning capabilities. We anticipate that pharma’s need to bring together mountains of data and information and to properly analyse that information all depend on having a stable, well-developed semantic framework that links information/data and that allows reasoning systems to perform some of our more "mundane" analysis work.

*Robin McEntire

72

OBO Relation Ontology

“Relations in Biomedical Ontologies”, Genome Biology, Apr. 2005

relations for continuants behave differently from relations for processes

73

part_offor component types is

time-indexed

A part_of B =def.given any particular a and any time t, if a is an instance of A at t,then there is some instance b of B such that a is an instance-level part_of b at t

74

part_offor process types is not

time-indexed

A part_of B =def.given any particular a, if a is an instance of A,then there is some instance b of B such that a is an instance-level part_of b at t

75

Main Upper Level OntologiesCYCCycorp (Austin, TX)human being = partially tangible thing

SUO (Suggested Upper Ontology)IEEEmonkey, body covering

DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering)

BFO (Basic Formal Ontology)

76

SUO top levelEntity

– Physical • Object

– SelfConnectedObject » Substance » CorpuscularObject » Food

– Region – Collection – Agent

• Process – Abstract

• SetOrClass • Relation • Quantity

– Number – PhysicalQuantity

• Attribute • Proposition

77

MIGS Specification Top Levels

Organism

Phenotype

Environment

Sample Process

Data Process

top related