1 how to build an ontology barry smith

77
1 How to Build an Ontology Barry Smith http:// ontology.buffalo.edu/ smith

Upload: alexia-shelton

Post on 14-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 How to Build an Ontology Barry Smith

1

How to Build an Ontology

Barry Smithhttp://ontology.buffalo.edu/smith

Page 2: 1 How to Build an Ontology Barry Smith

2

Ontology

A classification of entities and the relations

between them. Ontology is a list of types structured by relationsDefined by a scientific field's vocabulary and by the

canonical formulations of its theories. Scientific theories consist of generalizations.

What I will not be talking about: XML, OWL, ..., data(types), information models, file formats ...

Page 3: 1 How to Build an Ontology Barry Smith

3

Top-Level

GOOBO, OBO Core

NCBOFMA

NCBC Roadmap CentersNCI EVS

NECTAR (National Electronic Clinical Trials and Research) Network

Page 4: 1 How to Build an Ontology Barry Smith

4

Instances are not included in an ontology

It is the generalizations that are important

(but instances must still be taken into account)

Page 5: 1 How to Build an Ontology Barry Smith

5

A 515287 DC3300 Dust Collector Fan

B 521683 Gilmer Belt

C 521682 Motor Drive Belt

Page 6: 1 How to Build an Ontology Barry Smith

6

Ontology Types Instances

Page 7: 1 How to Build an Ontology Barry Smith

7

Ontology = A Representation of Types

Page 8: 1 How to Build an Ontology Barry Smith

8

Ontology = A Representation of Types

Each node of an ontology consists of:

• preferred term (aka term)

• term identifier (TUI, aka CUI)

• synonyms

• definition, glosses, comments

Page 9: 1 How to Build an Ontology Barry Smith

9

Ontology = A Representation of Types

Nodes in an ontology are connected by relations:

primarily: is_a (= is subtype of) and part_of

designed to support search, reasoning and annotation

Page 10: 1 How to Build an Ontology Barry Smith

10

Rules for formating terms

• Terms are names of types: if you prefix a term with

the type ___the term should still make sense

• Hence: terms should be in the singular• Terms should be lower case• Avoid abbreviations even when it is clear in

context what they mean (‘breast’ for ‘breast tumor’)

Page 11: 1 How to Build an Ontology Barry Smith

11

Motivation: to capture reality

Inferences and decisions we make are based upon what we know of reality.

An ontology is a computable representation of this underlying bio(techno)logical reality.

Enables a computer to reason over the data in (some of) the ways that we do.

Page 12: 1 How to Build an Ontology Barry Smith

12

Biomedical ontology integration / interoperability

Will never be achieved through integration of meanings or concepts

The problem is precisely that different user communities use different concepts

What’s really needed is to have well-defined commonly used relationships

Page 13: 1 How to Build an Ontology Barry Smith

13

Concepts

Biomedical ontology integration will never be achieved through integration of meanings or concepts

The problem is precisely that different user communities use different concepts

Page 14: 1 How to Build an Ontology Barry Smith

14

Concepts

Concepts are in your head and will change as our understanding changes

Ontologies represent types: not concepts, meanings, ideas ...

Types exist, with their instances, in objective reality

– including types of experimental process, design, method, ...

Page 15: 1 How to Build an Ontology Barry Smith

15

Most ontologies are execrableBut some good ontologies do

already exist

• as far as possible don’t reinvent

• use the power of combination and collaboration

• ontologies are like telephones: they are valuable only to the degree that they are used and networked with other ontologies

Page 16: 1 How to Build an Ontology Barry Smith

16

Why do we need rules/standards for good ontology?

Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking): unintuitive rules for classification lead to errors

Intuitive rule facilitate training of curators and annotators

Common rules allow alignment with other ontologies

Logically coherent rules enhance harvesting of content through automatic reasoning systems

Page 17: 1 How to Build an Ontology Barry Smith

17

Rules on types

Don’t confuse types with conceptsDon’t confuse types with ways of getting to

know typesDon’t confuse types with ways of talking

about typesDon’t confuses types with data about types

Page 18: 1 How to Build an Ontology Barry Smith

18

First Rule: Univocity

Terms (including those describing relations) should have the same meanings on every occasion of use.

In other words, they should refer to the same types in reality

Page 19: 1 How to Build an Ontology Barry Smith

19

Second Rule: Positivity

There are no negative types

Terms such as ‘non-mammal’ or ‘non-membrane’ do not designate genuine types.

(There are also no conjunctive and disjunctive types: rabbit and nailfile; rabbit or nosewipe)

Page 20: 1 How to Build an Ontology Barry Smith

20

Third Rule: Objectivity

Which types exist is not a function of our biological knowledge.

Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.

Page 21: 1 How to Build an Ontology Barry Smith

21

Fourth Rule: Single Inheritance

No type in a classificatory hierarchy should have more than one is_a parent on the immediate higher level

Page 22: 1 How to Build an Ontology Barry Smith

22

Rule of Single Inheritance

no diamonds:

C

is_a2

B

is_a1

A

Page 23: 1 How to Build an Ontology Barry Smith

23

Problems with multiple inheritance

B C

is_a1 is_a2

A

‘is_a’ no longer univocal

Page 24: 1 How to Build an Ontology Barry Smith

24

‘is_a’ is pressed into service to mean a variety of different things

shortfalls from single inheritance are often clues to incorrect entry of terms and relations

the resulting ambiguities make the rules for correct entry difficult to communicate to human curators

Page 25: 1 How to Build an Ontology Barry Smith

25

is_a Overloading

serves as obstacle to integration with neighboring ontologies

The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned.

Page 26: 1 How to Build an Ontology Barry Smith

26

To the degree that the above rules are not satisfied, error

checking and ontology alignment will be achievable,

at best, only with human intervention and via force

majeure

Page 27: 1 How to Build an Ontology Barry Smith

27

Current Best Practice:The Foundational Model of Anatomy

Page 28: 1 How to Build an Ontology Barry Smith

28

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

Page 29: 1 How to Build an Ontology Barry Smith

29

Current Best Practice:The Foundational Model of Anatomy

Follows formal rules for definitions laid down by Aristotle.

When A is_a B, the definition of ‘A’ takes the form:

an A =def. a B which ...

a human being =def. an animal which is rational

Page 30: 1 How to Build an Ontology Barry Smith

30

FMA Example

Cell def an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus

Plasma membrane =def a cell part that surrounds the cytoplasm

Page 31: 1 How to Build an Ontology Barry Smith

31

The FMA regimentation

Brings the advantage that each definition reflects the position in the hierarchy to which a defined term belongs.

The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it.

The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation

Page 32: 1 How to Build an Ontology Barry Smith

32

GO now adopting structured definitions contain both genus and differentiae

Essence = Genus + Differentiae

neuron cell differentiation =Genus: differentiation (processes whereby a relativelyunspecialized cell acquires the specialized features of..)Differentiae: acquires features of a neuron

Page 33: 1 How to Build an Ontology Barry Smith

33

Ontology alignmentOne of the current goals of GO is to align:

cone cell fate commitment retinal_cone_cell

keratinocyte differentiation keratinocyte

adipocyte differentiation fat_cell

dendritic cell activation dendritic_cell

lymphocyte proliferation lymphocyte

T-cell homeostasis T_lymphocyte

garland cell differentiation garland_cell

heterocyst cell differentiation heterocyst

Cell Types in GO Cell Types in the Cell Ontologywith

Page 34: 1 How to Build an Ontology Barry Smith

34

Alignment of the two ontologies will permit the generation of consistent and complete definitions

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

Page 35: 1 How to Build an Ontology Barry Smith

35

Other Ontologies to be aligned with GO

Chemical ontologies– 3,4-dihydroxy-2-butanone-4-phosphate synthase

activity

Anatomy ontologies– metanephros development

GO itself– mitochondrial inner membrane peptidase activity

OBO core

Page 36: 1 How to Build an Ontology Barry Smith

36

eventually to comprehend all of OBO

Page 37: 1 How to Build an Ontology Barry Smith

37

Top Level OBO-UBO

continuants: objects, characteristics, spatial regions

occurrents: processes, temporal regions, spatio-temporal regions

Page 38: 1 How to Build an Ontology Barry Smith

38

Definitions should be intelligible to both machines and humans

Machines can cope with the full formal representation

Humans need modularity

Page 39: 1 How to Build an Ontology Barry Smith

39

Fifth Rule:Terms and relations should have

clear definitions

These tell us how the ontology relates to the world of biological instances, meaning the actual particulars in reality: – actual cells, actual portions of cytoplasm, and

so on

Page 40: 1 How to Build an Ontology Barry Smith

40

But

Some terms are primitive (cannot be defined)

AVOID CIRCULAR DEFINITIONS !Avoid definitions of the forms:

An A is an A which is B (person = person with identity documents)

An A is the B of an A (heptolysis = the causes of heptolysis)

Page 41: 1 How to Build an Ontology Barry Smith

41

siamese

mammal

cat

organism

substancetypes

animal

instances

frogleaf type

Page 42: 1 How to Build an Ontology Barry Smith

42

Benefits of well-defined relationships

If the relations in an ontology are well-defined, then reasoning can cascade from one relational assertion (A R1 B) to the next (B R2 C).

Find all DNA binding proteins should also find all transcription factor proteins becausetranscription factor is_a DNA binding

protein

Page 43: 1 How to Build an Ontology Barry Smith

43

What happens when an ontology has no clear definition of A is_a B:

cancer documentation is_a cancer

disease prevention is_a disease

living subject is_a information object representing an animal or complex organism

individual allele is_a act of observation

Page 44: 1 How to Build an Ontology Barry Smith

44

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

Page 45: 1 How to Build an Ontology Barry Smith

45

How to define A is_a B

A is_a B =def.

all instances of A are as a matter of biological science also instances of B

here A and B are names of types in reality

Page 46: 1 How to Build an Ontology Barry Smith

46

How to define A is_a B

A is_a B =def.

for all a if a instance_of A, then a instance_of B

Page 47: 1 How to Build an Ontology Barry Smith

47

Kinds of relations

Between types:– is_a, part_of, ...

Between an instance and a type– this explosion instance_of the type explosion

Between instances:– Mary’s heart part_of Mary

Page 48: 1 How to Build an Ontology Barry Smith

48

Part_of as a relation between types is more problematic than

is standardly supposedheart part_of human being ?

human heart part_of human being ?

human being has_part human testis ?

testis part_of human being ?

Page 49: 1 How to Build an Ontology Barry Smith

49

Definition of part_of as a relation between types

A part_of B =Def all instances of A are instance-level parts of some instance of B

human testis part_of adult human being

Page 50: 1 How to Build an Ontology Barry Smith

50

Instance level

this nucleus is adjacent to this cytoplasm

implies:

this cytoplasm is adjacent to this nucleus

Type level

nucleus adjacent_to cytoplasm

Not: cytoplasm adjacent_to nucleus

seminal vesicle adjacent_to urinary bladder

Not: urinary bladder adjacent_to seminal vesicle

Page 51: 1 How to Build an Ontology Barry Smith

51

Definitions of the all-some form

allow cascading inferences

If A R1 B and B R2 C, then we know that

every A stands in R1 to some B, but we know also that, whichever B this is, it can be plugged into the R2 relation

Page 52: 1 How to Build an Ontology Barry Smith

52

c at t1

C

c at t

C1

time

same instance

transformation_of

pre-RNA mature RNA

adultchild

Page 53: 1 How to Build an Ontology Barry Smith

53

transformation_of

A transformation_of B =Def. Every instance of A was at some earlier time an

instance of B

adult transformation_of child

Page 54: 1 How to Build an Ontology Barry Smith

54

embryological development C

c at t c at t1

C1

Page 55: 1 How to Build an Ontology Barry Smith

55

C

c at t c at t1

C1

tumor development

Page 56: 1 How to Build an Ontology Barry Smith

56

C

c at t

C1

c1 at t1

C'

c' at t

time

instances

zygote derives_fromovumsperm

derives_from

Page 57: 1 How to Build an Ontology Barry Smith

57

One main obstacle to integrating biological and experiment-

generated data

Most ontologies have no facility for dealing with time and instances

Page 58: 1 How to Build an Ontology Barry Smith

58EXPO: Experiment Ontology

Page 59: 1 How to Build an Ontology Barry Smith

59

representational style part_of experimental hypothesisexperimental actions part_of experimental design

Page 60: 1 How to Build an Ontology Barry Smith

60tool part_of experimental design

(confuses object with specification)

Page 61: 1 How to Build an Ontology Barry Smith

61

hypothesis driven is_a Galilean

Page 62: 1 How to Build an Ontology Barry Smith

62

physical is_a scientific experiment(avoid abbreviations)

Page 63: 1 How to Build an Ontology Barry Smith

63

admin info about experiment is_a scientific experiment

Page 64: 1 How to Build an Ontology Barry Smith

64

where is the top level? objects, processes, characteristics

Page 65: 1 How to Build an Ontology Barry Smith

65

is_a and part_of never cross categorial divides

(cf. tripartite organization of GO)

if A is_a B

then A is an object type iff B is an object type

then A is a process type iff B is a process type

then A is a characteristic type iff B is a characteristic type

Page 66: 1 How to Build an Ontology Barry Smith

66

Some thoughts on time

continuants vs. occurrentsobjects, characteristics vs. processes

timetimeline

daydaytime

menstrual cyclehigh tide

Page 67: 1 How to Build an Ontology Barry Smith

67

What is time?

Page 68: 1 How to Build an Ontology Barry Smith

68

Top Level OBO-UBO

continuants: objects, characteristics, spatial regions

occurrents: processes, temporal regions, spatio-temporal regions

Space = the largest spatial region

Time = the largest temporal region

Page 69: 1 How to Build an Ontology Barry Smith

69

Relative time, subjective time

terms describing (regions of) time in special (qualitative, perspective-dependent, landmark dependent) ways

tomorrow, yesterday

uptown, downtown

phase A trial

Wednesday

Page 70: 1 How to Build an Ontology Barry Smith

70

Characteristics are continuants

many characteristics have realizations, applications or executions, which are processes

plandesignmethodmenstrual cyclefunction

Page 71: 1 How to Build an Ontology Barry Smith

71

GlaxoSmithKline*

What we need is “industrial-strength” ontologies with a consistent and rich representation formalism that are amenable for use as an integration framework, and support reasoning capabilities. We anticipate that pharma’s need to bring together mountains of data and information and to properly analyse that information all depend on having a stable, well-developed semantic framework that links information/data and that allows reasoning systems to perform some of our more "mundane" analysis work.

*Robin McEntire

Page 72: 1 How to Build an Ontology Barry Smith

72

OBO Relation Ontology

“Relations in Biomedical Ontologies”, Genome Biology, Apr. 2005

relations for continuants behave differently from relations for processes

Page 73: 1 How to Build an Ontology Barry Smith

73

part_offor component types is

time-indexed

A part_of B =def.given any particular a and any time t, if a is an instance of A at t,then there is some instance b of B such that a is an instance-level part_of b at t

Page 74: 1 How to Build an Ontology Barry Smith

74

part_offor process types is not

time-indexed

A part_of B =def.given any particular a, if a is an instance of A,then there is some instance b of B such that a is an instance-level part_of b at t

Page 75: 1 How to Build an Ontology Barry Smith

75

Main Upper Level OntologiesCYCCycorp (Austin, TX)human being = partially tangible thing

SUO (Suggested Upper Ontology)IEEEmonkey, body covering

DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering)

BFO (Basic Formal Ontology)

Page 76: 1 How to Build an Ontology Barry Smith

76

SUO top levelEntity

– Physical • Object

– SelfConnectedObject » Substance » CorpuscularObject » Food

– Region – Collection – Agent

• Process – Abstract

• SetOrClass • Relation • Quantity

– Number – PhysicalQuantity

• Attribute • Proposition

Page 77: 1 How to Build an Ontology Barry Smith

77

MIGS Specification Top Levels

Organism

Phenotype

Environment

Sample Process

Data Process