obol: open bio-ontology language using grammars to extract and use implicit knowledge in the go and...

43
Obol: Obol: Open Bio-Ontology Open Bio-Ontology Language Language Using grammars to extract and use implicit Using grammars to extract and use implicit knowledge in the GO and OBO knowledge in the GO and OBO Chris Mungall Chris Mungall Berkeley Drosophila Genome Berkeley Drosophila Genome Project / GO Consortium Project / GO Consortium

Upload: trinity-griffin

Post on 28-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Obol:Obol:Open Bio-Ontology Open Bio-Ontology

LanguageLanguageUsing grammars to extract and use Using grammars to extract and use

implicit knowledge in the GO and OBOimplicit knowledge in the GO and OBO

Chris MungallChris Mungall

Berkeley Drosophila Genome Berkeley Drosophila Genome Project / GO ConsortiumProject / GO Consortium

Page 2: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

ObolObol

Obol is a system for discovering and Obol is a system for discovering and reasoning over hidden knowledge in reasoning over hidden knowledge in ontologiesontologies

Obol is useful for helping maintain Obol is useful for helping maintain cross-products in the Gene Ontologycross-products in the Gene Ontology

Obol works by parsing syntax and Obol works by parsing syntax and semantics from GO and OBO terms semantics from GO and OBO terms

Page 3: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Motivation: Ontology Motivation: Ontology MaintenanceMaintenance

GO: 3 ontologies, 16k terms, 23k GO: 3 ontologies, 16k terms, 23k relationshipsrelationships

OBO: cell, biochemical, sequence and OBO: cell, biochemical, sequence and multiple anatomical ontologiesmultiple anatomical ontologies

Many GO terms are combinatorial Many GO terms are combinatorial (cross-products)(cross-products) ““regulation of neutrophil differentiation”regulation of neutrophil differentiation”

No explicit links between ontologiesNo explicit links between ontologies Difficult to maintain manuallyDifficult to maintain manually

Page 4: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Some Sample GO termsSome Sample GO terms

‘regulation of neutrophil differentiation.’‘neutrophil differentiation.’

‘granuloctye differentiation.’‘smooth muscle contraction.’

‘nucleolar chromatin.’‘nucleolus.’

‘oxygen transport.’‘negative regulation of interleukin-2

biosynthesis.’‘oxidoreductase activity, acting on paired donors,

with incorporation or reduction of molecular oxygen, reduced iron-sulfur protein as one donor, and incorporation of one atom of oxygen.’

Page 5: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Graph complexityGraph complexity

interleukin-2biosynthesis

cytokinebiosynthesis

biosynthesis

regulation ofinterleukin-2biosynthesis

regulation ofcytokine

biosynthesis

regulation ofbiosynthesis

negativeregulation

ofinterleukin-2biosynthesis

negativeregulation

ofcytokine

biosynthesis

negativeregulation ofbiosynthesis

part-of

is-a

Page 6: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Automatic inference of Automatic inference of relationshipsrelationships

Some relationships can be derived Some relationships can be derived computationally…computationally…

……provided we have complete logical provided we have complete logical definitionsdefinitions

regulation ^ (regtype:negative) ^(regprocess:biosynthesis ^ (makes:interleukin-2))

Tools exist for reasoning over these logical definitions, but…

Page 7: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Generating logical Generating logical definitionsdefinitions

Generating and maintaining logical Generating and maintaining logical definitions for GO/OBO is non-trivialdefinitions for GO/OBO is non-trivial

Obol exploits the highly regular Obol exploits the highly regular grammatical structure of GO term namesgrammatical structure of GO term names ““regulation of X”, never “X regulation”regulation of X”, never “X regulation” ““Y biosynthesis”, never “biosynthesis of Y”Y biosynthesis”, never “biosynthesis of Y” no stemming requiredno stemming required

Obol derives candidate class definitions Obol derives candidate class definitions from term names, and performs basic from term names, and performs basic reasoning over themreasoning over them

Page 8: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Obol: parsing and Obol: parsing and reasoningreasoning

GO/OBO TermLexical string

Class Definition(s)may involve relationshipsto other OBO terms

Inferencesusing definitions and

existing ontologies

“interleukin-2 biosynthesis”

biosynthesis^(makes:interleukin-2)

“interleukin-2 biosynthesis” is_a“cytokine biosynthesis”inferred from: interleukin-2 is_a cytokine

Page 9: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

How Obol WorksHow Obol Works

term names are broken into lexical tokens term names are broken into lexical tokens (words) using a tokeniser(words) using a tokeniser

tokens are parsed using a tokens are parsed using a grammargrammar, , generating parse treesgenerating parse trees

parse trees are turned into class definitions parse trees are turned into class definitions using using transformation rulestransformation rules and and property property definitionsdefinitions

transformation is reversibletransformation is reversible class definitions are reasoned overclass definitions are reasoned over implemented in XSB Prologimplemented in XSB Prolog

Page 10: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Word tokensWord tokens

Obol uses an atomic vocabulary of Obol uses an atomic vocabulary of word tokensword tokens

tokens are partitioned by ontology tokens are partitioned by ontology domaindomain cell, anatomy, biological process, etccell, anatomy, biological process, etc

tokens have a grammatical typetokens have a grammatical type adj, noun, prep, relational adj, specialadj, noun, prep, relational adj, special

vocabularies need not be correct or vocabularies need not be correct or completecomplete

Page 11: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Computational Computational GrammarsGrammars

formal grammars can elucidate formal grammars can elucidate sentence structuresentence structure

grammars transform token lists grammars transform token lists into parse treesinto parse trees

multiple parses may be possiblemultiple parses may be possible parses are reversibleparses are reversible a grammar is a collection of a grammar is a collection of

transformation rulestransformation rules

Page 12: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

A simple OBO term A simple OBO term grammargrammar

Term --> NP e.g. negative regulation of interleukin-2 biosynthesis NP --> NP PP e.g. negative regulation of interleukin-2 biosynthesis

NP --> NOUN e.g. interleukin;; regulation;; biosynthesis

NP --> NP-TOK e.g. interleukin-2

NP --> ADJ NP e.g. negative regulation

NP --> NP NP e.g. interleukin-2 biosynthesis

PP --> PREP NP e.g. of interleukin-2 biosynthesis

(subset of the whole OBO grammar)

Page 13: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Applying grammar rulesApplying grammar rules

negative regulation of interleukin-2 biosynthesisadj noun prep noun tok noun

np npnpnp -> n

npnp -> adj np npnp -> np-tok

np

np -> np nppp

pp -> p npnp

np -> np pp

term -> np

Page 14: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Generating Class Generating Class DefinitionsDefinitions

A parse tree shows the A parse tree shows the syntaxsyntax structure of a structure of a termterm

A class definition is a description of the A class definition is a description of the meaningmeaning of a term of a term

An Obol classdef is a cross product An Obol classdef is a cross product (intersection) of necessary and sufficient (intersection) of necessary and sufficient conditionsconditions

Classdefs are generated from parse trees using Classdefs are generated from parse trees using tree transform rules and property descriptionstree transform rules and property descriptions

Classdefs can be exported using obo or OWL Classdefs can be exported using obo or OWL formatformat

Page 15: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Property definitions Property definitions guide class constructionguide class construction

Property name: makes domain: biosynthesis range: substance grammar: np_modifier interleukin-2

npnp

np

biosynthesis

biosynthesis^(makes:interleukin-2)

Page 16: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Property definitions Property definitions guide class constructionguide class construction

negativenpadj

np

regulation

regulation^(regtype:negative)

Property name: regtype domain: regulation range: neg/pos grammar: np_modifier

Page 17: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Property definitions Property definitions guide class constructionguide class construction

regulation^(regtype:negative)

npnp

pp

biosynthesis^(makes:IL-2)

regulation^(regtype:negative)^(regprocess:biosynthesis^(makes:IL-2))

Property name: regprocess domain: regulation range: biological_process grammar: prep(of)

np

of

Page 18: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Unparseable terms and Unparseable terms and multi-parse termsmulti-parse terms

biologicalprocess

molecularfunction

cellularcomponent

single-token terms excluded from this analysis

Page 19: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Reasoning over class Reasoning over class definitionsdefinitions

Using class definitions, we can:Using class definitions, we can: autocreate parentage for new termsautocreate parentage for new terms check for missing relationshipscheck for missing relationships find inconsistencies between ontologiesfind inconsistencies between ontologies generate implicit orthogonal ontologiesgenerate implicit orthogonal ontologies

Method:Method: Use native OBOL rules (via prolog or DAG-Use native OBOL rules (via prolog or DAG-

Edit)Edit) OROR use external reasoner; eg RACER, FaCT use external reasoner; eg RACER, FaCT

Page 20: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Finding missing Finding missing relationshipsrelationships

Obol is run periodically on GO to check Obol is run periodically on GO to check for missing IS A and PART OF for missing IS A and PART OF relationshipsrelationships

Multiple parses produce false-positivesMultiple parses produce false-positives 223 missing relationships added to GO223 missing relationships added to GO ToDo: increase specificity by improving ToDo: increase specificity by improving

vocabularies and property definitionsvocabularies and property definitions

Page 21: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Obol sample reportObol sample report

nucleolar chromatin PART OF nucleusclathrin-coated vesicle HAS PART clathrin coatchromoplast membrane IS A plastid membranenuclear microtubule PART OF nucleusvitamin E biosynthesis IS A vitamin E metabolismuracil permease activity IS A permease activitychloroplast envelope IS A plastid envelopenegative regulation of lipid biosynthesis IS A negative regulation of lipid metabolismketone body metabolism IS A ketone metabolismdense nuclear body IS A nuclear body

falsepositive!

inversepresent

Page 22: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Aligning to the OBO cell Aligning to the OBO cell ontologyontology

cardioblastdifferentiation

cardiac celldifferentiation

cardioblast

mesodermalcell

animalcell

cardiacmusclecell

musclecell

DEVELOPSFROM

most differentiation terms align precisely; some don’t…

???

Page 23: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Deriving existing GO Deriving existing GO relationshipsrelationships

Function Process Component

all relationships 8002 13613 1691

obol-derivable relationships

479 3055 346

non-trivially derivable relationships

0 1089 63

Page 24: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Obol as an ontology Obol as an ontology curation toolcuration tool

Obol can be used by GO curators in a variety Obol can be used by GO curators in a variety of waysof ways

““Behind the scenes”Behind the scenes” IterativeIterative

GO curator receives periodic suggestion reportsGO curator receives periodic suggestion reports ContinuousContinuous

GO curator uses OBOL interactively via DAG-Edit pluginGO curator uses OBOL interactively via DAG-Edit plugin

To help the transition to a fully specified To help the transition to a fully specified ontologyontology

GO curators then maintain class definitionsGO curators then maintain class definitions

Obol as a search tool?Obol as a search tool?

Page 25: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Problems to addressProblems to address Integration with curation processIntegration with curation process Memory usageMemory usage Syntax parsingSyntax parsing

chemical terms, long termschemical terms, long terms Dealing with Dealing with andand, , oror and and notnot Generating text definitionsGenerating text definitions Word list maintenanceWord list maintenance

solution: integrate with ontology maintenancesolution: integrate with ontology maintenance Ontology dependenciesOntology dependencies

protein and generic anatomy ontologies neededprotein and generic anatomy ontologies needed Obol can be used to help generate theseObol can be used to help generate these

Page 26: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

ConclusionsConclusions

Obol is useful for maintainng large Obol is useful for maintainng large GO-style ontologiesGO-style ontologies

combination of semantic parsing combination of semantic parsing with reasoning is powerfulwith reasoning is powerful benefits of both GO-style ontology benefits of both GO-style ontology

development and formal reasoningdevelopment and formal reasoning

Page 27: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

AcknowledgementsAcknowledgements Berkeley/GO

John Richter Brad Marshall Karen Eilbeck Suzanna Lewis Gerry Rubin

Jackson Labs/GO David Hill Joel Richardson Judith Blake

GO CuratorsMidori HarrisJennifer ClarkAmelia IrelandJane Lomax

ManchesterChris WroeRobert StevensPhillip Lord

J Michael CherryMichael Ashburnerall the GO

Consortium

Page 28: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

The OBO Universe The OBO Universe (partial)(partial)

Phenotype

Sequence

Anatomy

Cell

Protein

Chemical

Component

Process

Function

Fish-Anat

Fly-Anat

Page 29: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Detecting Detecting inconsistenciesinconsistencies

Y binding

X binding

Z binding

X

Z

Y

Page 30: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Prolog as an ontology Prolog as an ontology languagelanguage

% DATABASE OF FACTS% DATABASE OF FACTS

isa(carb_binding, binding).isa(carb_binding, binding).

isa(polysac_binding, isa(polysac_binding, carb_binding).carb_binding).

isa(chitin_binding, isa(chitin_binding, polysac_binding)polysac_binding)

isa(cellulose_binding, isa(cellulose_binding, polysac_binding).polysac_binding).

% INFERENCE RULES% INFERENCE RULES

isaT(isaT(XX,,YY):- isa():- isa(XX,,YY).).

isaT(isaT(XX,,YY):-isa():-isa(XX,,ZZ),), isaT( isaT(ZZ,,YY).).

● ?- isaT(chitin_binding, binding).

● YES

● ?-isaT(X, polysac_binding).

● X=carb_binding.

● X=chitin_binding.

● X=cellulose_binding.

● ?-isaT(chitin_binding, cellulose_binding).

● NO

● ?-isaT(X,Y). % returns all paths

Page 31: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

regulation

regulates

contraction muscle

negative smooth

class(regulation <process> qualifier=class(negative <general>) regulates=class(contraction <process> affects_cell_type=class(muscle <anatomical> has_type=class(smooth <general>))))

Prolog internal Prolog internal representationrepresentation

Page 32: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Prolog Grammar Prolog Grammar ImplementationImplementation

Prolog: the classic logic programming Prolog: the classic logic programming languagelanguage High-level declarative language, natural choice High-level declarative language, natural choice

for ontologies; built in “database”for ontologies; built in “database” Definite Clause Grammars (DCGs)part of the Definite Clause Grammars (DCGs)part of the

language; language; DCGs allow passing data up the parse treeDCGs allow passing data up the parse tree

XSB PrologXSB Prolog Uses tabling (more efficient, less re-calculation)Uses tabling (more efficient, less re-calculation) Tabling + DCGs = Tabling + DCGs = chart parsingchart parsing (Earley's (Earley's

algorithm)algorithm) This means we can have This means we can have left-recursiveleft-recursive grammars grammars

Page 33: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

A Formal Grammar for A Formal Grammar for OBO termsOBO terms

All(?) GO/OBO terms are NOUN-PHRASES (exception: All(?) GO/OBO terms are NOUN-PHRASES (exception: phenotypes?)phenotypes?)

A NOUN-PHRASE is (recursively) made fromA NOUN-PHRASE is (recursively) made from

a NOUN (includes inflected verbs; ega NOUN (includes inflected verbs; eg bind bindinging)) an ADJECTIVE followed by a NOUN-PHRASE eg an ADJECTIVE followed by a NOUN-PHRASE eg inner inner

membranemembrane a NOUN-PHRASE preceeded by a NOUN-PHRASE a NOUN-PHRASE preceeded by a NOUN-PHRASE acting asacting as

ADJECTIVE; eg ADJECTIVE; eg clathrin coatclathrin coat a NOUN-PHRASE then PREPOSITION then NOUN-PHRASE; eg a NOUN-PHRASE then PREPOSITION then NOUN-PHRASE; eg

regulation of transcriptionregulation of transcription an (optional) NOUN-PHRASE then a RELATIONAL ADJECTIVE an (optional) NOUN-PHRASE then a RELATIONAL ADJECTIVE

then a NOUN-PHRASE; eg then a NOUN-PHRASE; eg clathrin-coatclathrin-coateded vesicle vesicle Precedence rulesPrecedence rules are also required to prune parse forest are also required to prune parse forest

Simple but effectiveSimple but effective

Page 34: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

A formal grammar is a set of A formal grammar is a set of production rulesproduction rules operating over operating over terminal symbolsterminal symbols (eg (eg wordswords) ) and and non-terminal symbolsnon-terminal symbols (eg word/phrase (eg word/phrase categories)categories)

The rules determine how sequences of symbols The rules determine how sequences of symbols can be can be transformedtransformed, making a , making a parse treeparse tree

Sentence -> Subject Verb ObjectSubject -> Article NounObject -> Article NounArticle -> a | theVerb -> ate | chasedNoun -> cat | banana | mouse

GENERATING: Start at top ('sentence')and apply rules until all symbols are terminal

PARSING: Start at bottom – a sequence of terminals;apply rules, combining symbols if necessary

Parse tree for a simple sentence,“the cat ate the banana”

Page 35: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

negative regulation of smooth muscle contraction ADJ NOUN P ADJ NOUN NOUN

NP NP NP NP -> ADJ NOUN*

NP NP -> NP NP*

PP PP -> P NP*

NP NP -> NP* PPParse tree

Class Definition (shown as DAG)

regulation

regulates

contraction muscle

negative smooth

thick lines inidcatestem terms

the parse tree showsthe synctacticstructure

we need new OBO format (or OWL) to representcross-products (aka intersections / complete defs)

DAG definition is minimal – non-definitional relationships to other terms not shown

the classdef shows thelogical structure

recurse down treeapplying grammaticalcontext rules to getproperty fillers

X

XX

XX

negative regulationof smooth muscle contraction

smooth musclecontraction

smooth muscle

Page 36: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

COPII-coated vesicle COPII-coated vesicle membranemembrane

class(membrane <component> “COPII-coated vesicle membrane” part_of=class(vesicle <component> “COPII-coated vesicle” has_part=class(coat <component> “COPII coat” made_from=class(COPII <complex> “COPII”))))

class/term name shown in quotes; these can be derived byreversion the transformation

requires use of inverse properties (has_part vs part_of)- supported in new OBO format.

the above classdef is consistent with what is in the GOcellular_component ontology

Page 37: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

OutlineOutline

Motivation – combinatorial issues with composite termsMotivation – combinatorial issues with composite terms

Approaches: annotation-time term composition Approaches: annotation-time term composition vsvs tools for tools for

maintenance of large DAGsmaintenance of large DAGs

The OBOL SystemThe OBOL System

Term decomposition using grammarsTerm decomposition using grammars

Generating computable logical class definitionsGenerating computable logical class definitions

Rules and reasoning over class definitionsRules and reasoning over class definitions

Initial ResultsInitial Results

Strategies for using OBOL within GO/OBOStrategies for using OBOL within GO/OBO

Page 38: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Manual maintenance of Manual maintenance of GOGO

GO is 3 DAGs of over 16k termsGO is 3 DAGs of over 16k terms Large DAGs of terms are hard to maintainLarge DAGs of terms are hard to maintain ““cross-products” produce combinatorial cross-products” produce combinatorial

explosions and highly connected sub-explosions and highly connected sub-graphsgraphs

GO terms include OBO termsGO terms include OBO terms eg eg oxygen bindingoxygen binding; ; wing developmentwing development

Zipf's LawZipf's Law Many terms not yet used in annotation Many terms not yet used in annotation

(Ogren, pers. Comm.)(Ogren, pers. Comm.)

Page 39: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Example combinatorial Example combinatorial explosionexplosion

contraction

muscle contraction

smooth muscle contraction

negative regulation of contraction

Negative regulation of muscle contraction

regulation

regulation of contraction

regulation ofmuscle contraction

regulation ofsmooth muscle contraction

negative regulation ofsmooth muscle contraction

Negative regulation

part_of

part_of

“implicit” terms

negative regulation

negative regulation of muscle contraction

actual GO terms

implicit anatomicalterms not shown

affects

part_of affects

cell motility

Page 40: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

One Approach: One Approach: PropertiesProperties

One One extremeextreme solution is to remove composite terms from ontology solution is to remove composite terms from ontology

altogetheraltogether

Generate “anonymous” composite terms at annotation-time via Generate “anonymous” composite terms at annotation-time via

property/slotproperty/slot restrictions to restrictions to atomic termsatomic terms

● bindingbinding ^ ^ affects(affects(interleukin-18interleukin-18))

● contractioncontraction ^ ^ affects(affects(musclemuscle++typetype((smoothsmooth))))

These bindings constitute a These bindings constitute a class definitionclass definition

But it is still necessary to make statements about composite terms in But it is still necessary to make statements about composite terms in

the ontologythe ontology

● macrophage activationmacrophage activation is_ais_a immune cell activityimmune cell activity

● fibrinolysis fibrinolysis is_ais_a negative regulation of blood coagulation negative regulation of blood coagulation

Page 41: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Another approach: Another approach: Computationally aided Computationally aided ontology maintenanceontology maintenance

GO terms exhibit regularity in their GO terms exhibit regularity in their syntactic structuresyntactic structure SubstringSubstring relationships highly relationships highly

correlated with actual relationshipscorrelated with actual relationships– regulation of smooth muscle contractionregulation of smooth muscle contraction

– smooth muscle contractionsmooth muscle contraction

– muscle contractionmuscle contraction

– contractioncontractionOgren PV, Cohen KB, Acquaah-Mensah GK, Eberlein J, Hunter L. 2004. The compositional structure of Gene Ontology terms.Pac Symp Biocomput 9: 214-215.

Page 42: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

OBOL: Syntax and OBOL: Syntax and SemanticsSemantics

What about using the term syntax What about using the term syntax to get at the to get at the meaningmeaning of the term? of the term?

GO/OBO Term

Class Definitionmay involve relationshipsto other OBO terms

Inferences

“interleukin binding”

binding^affects(interleukin)

“interleukin binding” is_a“cytokine binding”inferred from: interleukin is_a cytokine

Page 43: Obol: Open Bio-Ontology Language Using grammars to extract and use implicit knowledge in the GO and OBO Chris Mungall Berkeley Drosophila Genome Project

Inference of Inference of intermediate terms and intermediate terms and

IS_As – example ruleIS_As – example rule

class(regulation process_regulated=R qual=Q) is_aclass(regulation process_regulated=R' qual=Q) <=> R is_a R'

FORALL classdef pairs IFF the stem-class is the same AND all the property-values in the restriction-list are identical EXCEPT for one property, in which the property-values are linked by an isa, THEN the classdefs are linked by an isa

class(C P1=V1 P2=V2..Px=Vx Pn=Vn) is_aclass(C P1=V1 P2=V2..Px=Vx' Pn=Vn) <=> Vx is_a Vx'