biomedical ontology tutorial_atlanta_june2011_part1

Download Biomedical ontology tutorial_atlanta_june2011_part1

Post on 13-Jun-2015




0 download

Embed Size (px)


  • 1. How to Build a Biomedical Ontology Success Stories The Gene Ontology (GO) SNOMED, ICD and other controlled vocabulariesOntology Design Principles Ontology Applications Barry Smith

2. Uses of ontology in PubMed abstracts 2 3. 3 4. By far the most successful: GO (Gene Ontology) 4 5. 5 6. Hierarchical view of GOrepresenting relationsbetween represented types6 7. Gene Ontology$100 mill. invested in literature and databasecuration using the Gene Ontology (GO)based on the idea of annotationover 11 million annotations relating geneproducts (proteins) described in the UniProt,Ensembl and other databases to terms in theGOmultiple secondary uses because theontology was not built to meet one specificset of requirements 7 8. GO provides a controlled system of termsfor use in annotating (describing, tagging)data multi-species, multi-disciplinary, opensource contributing to the cumulativity ofscientific results obtained by distinctresearch communities compare use of kilograms, meters,seconds in formulating experimentalresults 8 9. Sample Gene Array Data 9 10. semantic annotation of data where in the cell ? what kind ofmolecular function ?what kind ofbiological process?10 11. natural language labels to make the data cognitively accessible to human beings11 12. compare: legends for maps12 13. compare: legends for diagrams13 14. ontologies are legends for data14 15. compare: legends for maps15 16. ontologies are legends for images16 17. what lesion ?what brain function ?17 18. ontologies are legends for databasesMouseEcotopeGlyProt sphingolipidtransporter activityDiabetInGene GluChem18 19. annotation using common ontologies yields integration of databasesMouseEcotope GlyProt Holliday junction helicase complexDiabetInGene GluChem 19 20. annotation using common ontologiescan support comparison of data 20 21. annotation with Gene Ontologysupports reusability of datasupports search of data by humanssupports comparison of datasupports aggregation of datasupports reasoning with data by humansand machines 21 22. 22 23. The goal: virtual science consistent (non-redundant) annotation cumulative (additive) annotation yielding, by incremental steps, a virtual map of the entirety of reality that is accessible to computational reasoning23 24. This goal is realizable if we have acommon ontology framework data is retrievable data is comparable data is integratable only to the degree that it is annotated using a common controlled vocabulary compare the role of seconds, meters, kilograms in unifying science 24 25. To achieve this end we have to engage in something like philosophy (?)is this the right way to organize the top level of thisportion of the GO?how does the top level of this ontology relate tothe top levels of other, neighboring ontologies?25 26. Strategy for doing thissee the world as organized viatypes/universals/categories which arehierarchically organizedand in relation to which statementscan be formulated which areuniversally true of all instances:cell membrane part_of cell26 27. AnatomicalAnatomical SpaceStructureOrgan Cavity OrganOrganOrgan Part Subdivision Cavity Serous SacSerous Sac OrganOrgan CavityCavitySerous SacComponentSubdivisionTissue Subdivisionis_a Pleural SacPleural SacPleura(WallPleural Pleura(Wall Pleuralof Sac) of Sac) CavityofCavityParietal Parietal Pleura t_PleuraVisceral Visceral Interlobar Pleura PleuraInterlobar r recessrecess Mediastinalpa MediastinalPleura Pleura MesotheliumMesothelium of Pleuraof Pleura 27 Foundational Model of Anatomy Ontology 28. species, substancegeneraorganism animalmammalcat frogsiameseinstances 28 29. 29 30. the problem of continuity of care: patients move aroundwith thanks to http://dbmotion.com30 31. ff ff ffsynchronic and diachronic problems of semantic interoperability(across space and across time)31 32. fff fEHR 1 EHR 2 ff how can we link EHR 1 to EHR 2 in areliable, trustworthy, useful way, whichboth systems can understand ? 32 33. ff fICDf EHR 1 EHR 2ff the ideal solution:WHO International Classification ofDiseases33 34. ICDPRO:De facto US billing standardMultilanguageCON:De facto US billing standard (corrupts data)No definitions of terms, and so difficult to judge accuracy of hierarchy and of codingInconsistent hierarchiesHard to reason with resultsHence few secondary uses e.g. for research34 35. ICD 11The (ontology-based) planmultiple views including billing public health statistics research SNOMED compatibility35 36. ff f SNOMED-CTf EHR 1EHR 2 ffthe ideal solution:a single universal clinical vocabulary 36 37. SNOMED CT: SystematizedNomenclatureofMedicine-Clinical TermsPRO:International standard (sort of)Huge resourceFree for member countriesMulti-language (including Spanish)37 38. SNOMED CTCONHuge(but redundant ... and gappy)Contains many examples of false synonymyStill in need of work No consistent interpretation of relations Many erroneous relation assertions Many idiosyncratic relations Mixes ontology with epistemology It contains numerous compound terms (e.g., test for X)without the constituent terms (here: X), even where thelatter are of obvious salience38 39. SNOMED CTCodingwith SNOMED-CT is unreliable and inconsistentMulti-stage multi-committee process for adding terms that follows intuitive rules and not formal principlesDoes there exist a strategy for evolutionary improvement? 39 40. f f f SNOMED-CTf EHR 1EHR 2ffan above all: SNOMED CT cannot solve theproblem of continuity of care because it has too much redundancy 40 41. fffSNOMED-CT fEHR 1EHR 2 ffanAND because it is used only in certaincountries 41 42. ffUnified Medicalf Language System (UMLS) fEHR 1 EHR 2 fflink EHR 1 to EHR 2 through a snapshot of the patients condition which both systems can understand42 43. Unified Medical Language System (UMLS) UMLSis not unified, not a language, not a system (and not only medical); it is an aggregation If we use something like UMLS as reference terminology, we will not solve the translation problemEN DE 44. R T U New York State Center of Excellence in Bioinformatics & Life SciencesUMLS approach to countering silo formation By linking between different clinical or biomedical vocabularies However: the Metathesaurus does not represent a comprehensive NLM-authored ontology of biomedicine or a single consistent view of the world. The Metathesaurus preserves the many views of the world present in its source vocabularies because these different views may be useful for different tasks. 45. R T U New York StateCenter of Excellence inBioinformatics & LifeSciences 46. Prospective standardization is a good thingProspective standardization is the only thingwhich will work in mission critical domainsProspective standardization means thatcertain limits to tolerance must be imposed,Need for top-down governance to ensurecommon architecture and resolution ofborder disputes in areas of overlap betweendomains 46 47. Principles of Best Practice in Ontology Development 47 48. Problem of ensuring sensible cooperation in a massively interdisciplinary communityConsider multiple uses of technical terms such as type concept instance model representation data48 49. Three LevelsL3. Words, models (publishedrepresentations, ontologies, databases ...)L2. Ideas (concepts, thoughts, memories, ...)L1. Things (cells, planets, processes of celldivision ...)49 50. Entity =defanything which exists, including things andprocesses, functions and qualities, beliefsand actions, documents and software(entities on levels 1, 2 and 3)50 51. First basic distinction among entities type vs. instance(science text vs. diary)(human being vs. Tom Cruise) 51 52. For ontologies it is generalizations that areimportant = types, universals,kinds, species52 53. Catalog vs. inventoryA 515287 DC3300 Dust Collector FanB 521683 Gilmer BeltC 521682 Motor Drive Belt 53 54. An ontology is a representation of typesWe learn about types in reality from lookingat the results of scientific experiments in theform of scientific theoriesexperiments relate to what is particularscience describes what is general54 55. Ontology =def. a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent 1. types in reality 2. those relations between these types which obtain universally (= for all instances) lung is_a anatomical structure lobe of lung part_of lungin accordance with our best current established science55 56. typesobjectorganism animalmammalcat frogsiameseinstances56 57. Domain =defa portion of reality that forms the subject-matter of a single science or technology ormode of study or administrative practice: proteomics epidemiology C2 M&S 57 58. Representation =defan image, idea, map, picture, name ordescription ... of some entity or entities.58 59. Ontologies are representationalartifactscomparable to science textsand subject to the same sorts of constraints (including need for update) 59 60. Representational units =defterms, icons, alphanumeric identifiers ...which refer, or are intended to refer, toentitiesand which are minimal (atoms) 60 61. Composite representation =defrepresentation(1) built out of representational unitswhich(2) form a structure that mirrors, or is intendedto mirror, the entities in some domain61 62. The Periodic TablePeriodic Table 62 63. Ontologies are here63 64. or here64 65. Ontologies represent generalstructures in reality (leg)65 66. Ontologies do not representconcepts in peoples heads66 67. They represent types in reality67 68. How do we know which generalterms designate types?Types are repeatables: cell, electron, weapon, F16 ...Instances are one-off: Bill Clinton, this laptop, this handwave68 6


View more >