principles for building biomedical ontologies suzanna lewis national center biomedical ontology 22...

Download Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

Post on 16-Jan-2016




0 download

Embed Size (px)


  • Principles for Building Biomedical Ontologies Suzanna LewisNational Center Biomedical Ontology22 October 2005Advanced Bioinformatics, Cold Spring Harbor

  • National Center Biomedical Ontology Mark MusenSuzanna Lewis Barry Smith Sima Misra Daniel Rubin Michael Ashburner Monte Westerfield Ida SimPI & Core 1: computer science (SMI)Co-PI & Core 2: bioinformatics (BiKR; GO)Core 6: Outreach and training (ECOR)Associate Program DirectorProgram DirectorCore 3: Phenotype Project (Cambridge; FlyBase; and GO)Core 3: Phenotype Project (UOregon; PI of ZFIN)Core 3: HIV clinical trials Project (UCSF)

  • BiKRsSima MisraShu ShengqiangChristopher J. MungallNomi HarrisJohn Day-RichterKaren EilbeckMark Gibson

  • Outline for the MorningA definition of ontologyFour sessions:Organizational ChallengesPrinciples for Ontology ConstructionCase Studies from the GOCase Studies for group discussion.

  • My newbie questionsWhat data is missing?What Ive heardWhere is the data generated?What is the motivation?How will it be gathered?Organism, environment, data quality and attributionTIGR, Sanger, JGI, and coming soon to a 954 near you!Still an issue. Low threshold of effort relative to benefits of complyingData it is accumulating on disks across the world and wed like to be able to locate and use itThe hardest part: Sharing (semantics)

  • Ontologies help with decision makinghandy ontology tells us whats thereWhere should I eat?

  • Type of cuisine(Presumable) country of originOntologies dont just organize data; they also facilitate inference, and that creates new knowledge, often unconsciously in the user.

  • Where delicatessen food hails fromFrozen Yogurt cuisine in search of a national identity?What a computer would likely infer about the world from this helpful ontology:Flag of fresh juiceFresh Juice is a national cuisine

  • Ontology is all about meaningCommunities form (scientific) theories that seek to explain all of the existing evidenceand can be used for predictionWe make inferences and decisions based upon what we know about (biological) reality.

  • Make our meanings clear enough for a computer to understandAn ontology is a computable representation of this underlying (biological) reality.An ontology enables a computer to reason over the data in (some of) the ways that we doparticularly to query and locate relevant data.A shared, common, backbone taxonomy of relevant entities, and the relationships between them, within an application domain.Referred to by information scientists as an Ontology'.

  • But reallyWhat is an Ontology?From Aristotle to Artificial Intelligence

    It is a formalism of what existsFollows formal rules for creating definitions originally laid down by Aristotle. A definition is: the specification of the essence (nature, invariant structure) shared by all the members of a class or natural kind.

  • The Aristotelian MethodologyTopmost nodes are the undefinable primitives. The definition of a class lower down in the hierarchy is provided by specifying the parent of the class together with the relevant differentia.Differentia tells us what marks out instances of the defined class within the wider parent class as inPlasma membrane is a cell part [immediate parent]that surrounds the cytoplasm [differentia]

  • classesmammalPhysical object (substance)frogleaf classall members of the class frog share a froggy nature

  • Anatomical structuresThoraxLungHeartCellCornelius Rosse

  • Content of FMAChallenge:Duplicate graphical model in symbolic modelAdapted fromBloom & Fawcett: Textbook of Histology 1994 12th edChapman & HallUniversals or classes:Kinds of anatomical entities

  • Content of FMA

  • 1. Organizational Challenges

  • So you want an ontologyWhat do you have to do to make/get/use/steal/beg one?

  • WhySurveyImproveDomain covered?Public?Active?Applied?Community?DevelopSalvageCollaborate & Learnyesno

  • What you must doJustify exactly why there is a needScope it very, very tightly

    Communicate with people

  • The decisions you must makeWhat domain does it cover?It is privately held?Is it active?Is it applied?

  • WhySurveyImproveDomain covered?Public?Active?Applied?Community?DevelopSalvageCollaborate & Learn (Listen to Barry)yesno

  • Due diligence & background researchStep 1: Learn what is out thereThe most comprehensive list is on the OBO site. http://obo.sourceforge.netAssess ontologies critically and realistically.Make contact

  • WhySurveyImproveDomain covered?Public?Active?Applied?Community?DevelopSalvageCollaborate & Learn (Listen to Barry)yesno

  • Ontologies must be sharedProprietary ontologiesBelief that ownership of the terminology gives the owners a competitive edgeFor example, Incyte or Monsanto in the past, SNOMED for non-US.Data cannot be shared if the ontologies describing the data are not shared. Dont reinventUse the power of combination and collaboration

  • WhySurveyImproveDomain covered?Public?Active?Applied?Community?DevelopSalvageCollaborate & Learn (Listen to Barry)yesno

  • Pragmatic assessment of an ontologyIs there access to help, ?Does a warm body answer help mail within a reasonable timesay 2 working days ?

  • WhySurveyImproveDomain covered?Public?Active?Applied?Community?DevelopSalvageCollaborate & Learn (Listen to Barry)yesno

  • Use it to improve itEvery ontology improves when it is applied to actual dataIt improves even more when these data are used to answer questionsThere will be fewer problems in the ontology and more commitment to fixing remaining problems when important research data is involved that scientists depend uponBe very wary of ontologies that have never been applied

  • Work with that communityTo improve (if you found one)To develop (if you did not)

    Getting it rightIt is impossible to get it right the 1st (or 2nd, or 3rd, ) time. What we know about reality is continually growing

  • Implication: prepare for changeEstablish a mechanism for change.Use CVS or Subversion.Changes must be reviewed by expertsUnique IdentifiersVersionsArchives

  • Ontology development is hardHave a stake in seeing it work.Have broad, detailed domain knowledge.Will engage in vigorous debate without engaging egos.Will do concrete work and attend frequent working sessions (quarterly), phone conferences (weekly), e-mail correspondence (daily).

  • 2. Principles for Ontology Construction

  • Why do we need rules for good ontology?Ontologies must be intelligible to humans (for annotation) and to machines (for reasoning and error-checking)Unintuitive rules for classification lead to entry errors (problematic links)Facilitate training of curatorsOvercome obstacles to alignment with other ontology and terminology systemsEnhance harvesting of content through automatic reasoning systemsFollowing basic rules makes more useful ontologies

  • Aristotles categoriesThis is Aristotles list of types of predication, that is, the different ways in which things can be said to be. He identifies 10 mutually exclusive categories.

  • SNOMED-CT Top LevelSubstanceBody StructureSpecimenContext-Dependent Categories*AttributeFinding*Staging and ScalesOrganismPhysical ObjectEventsEnvironments and Geographic LocationsQualifier ValueSpecial Concept*Pharmaceutical and Biological ProductsSocial ContextDiseaseProcedurePhysical Force

  • Examples of RulesDont confuse instances with universalsYour navel (instance) is not the abstract representation of all navelsYour microarray result is not the abstract representation of all microarray resultsThe meaning of an ontology should not change when the programming language changes

  • First Rule: Univocity Terms (including those describing relations) should have the same meanings on every occasion of use.In other words, they should refer to the same kinds of instances in reality

  • Example of univocity problem in case of part_of relation(Old) Gene Ontology:part_of = may be part offlagellum part_of cellpart_of = is at times part ofreplication fork part_of the nucleoplasmpart_of = is included as a sub-list in

  • Second Rule: PositivityComplements of classes are not themselves classes.

    Terms such as non-mammal, or non-frog, or non-membrane do not designate genuine classes.

  • Third Rule: ObjectivityWhich classes exist is not a function of our biological knowledge.

    Terms such as unknown or unclassified do not designate biological natural kinds.

  • Fourth Rule: Single InheritanceNo class in a classificatory hierarchy should have more than one is_a parent on the immediate higher levelI.e. no diamonds

  • Following the single inheritance rule The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it.The entire information content of the term hierarchy can be translated very cleanly into a computer representation

  • Problems with multiple inheritance B C

    is_a1 is_a2


    is_a no longer univocal

  • Fifth Rule: Clarity of Text DefinitionsThe terms used in a definition should be simpler (more intelligible) than the term to be definedotherwise the definition provides no assistance to human understandingMachines can cope with the full formal representation (it doesnt need the text)

  • Sixth Rule: Basis in RealityWhen building or maintai


View more >