kno.e · 27-05-2009 · bio2rdf: towards a mashup to build bioinformatics knowledge systems...
TRANSCRIPT
![Page 1: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/1.jpg)
Ontologies and data integration in biomedicine
Olivier Bodenreider
Lister Hill National Centerfor Biomedical Communications
Bethesda, Maryland - USA
Kno.e.sisWright State University, Dayton, Ohio
May 27, 2009
![Page 2: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/2.jpg)
Lister Hill National Center for Biomedical Communications 2
Outline
Why integrate data?Ontologies and data integrationExamplesChallenging issues
![Page 3: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/3.jpg)
Why integrate data?
![Page 4: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/4.jpg)
Lister Hill National Center for Biomedical Communications 4
Why integrate data?
Sources of informationCreated by
Independent researchersSeparate workflows
HeterogeneousScattered“Silos”
To identify patterns in integrated datasetsHypothesis generationKnowledge discovery
![Page 5: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/5.jpg)
Lister Hill National Center for Biomedical Communications 5
Motivation Translational research
“Bench to Bedside”Integration of clinical and research activities and resultsSupported by research programs
NIH RoadmapClinical and Translational Science Awards (CTSA)
Requires the effective integration and exchange and of information between
Basic researchClinical research
![Page 6: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/6.jpg)
Lister Hill National Center for Biomedical Communications 6
Genotype and phenotype[Goh, PNAS 2007]
• OMIM• [HPO]
![Page 7: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/7.jpg)
Genes and environmental factors
[Liu, BMC Bioinf. 2008]
• MEDLINE (MeSH index terms)• Genetic Association Database
![Page 8: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/8.jpg)
Lister Hill National Center for Biomedical Communications 8
Integrating drugs and targets[Yildirim, Nature Biot. 2007]
• DrugBank• ATC• Gene Ontology
![Page 9: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/9.jpg)
Why ontologies?
![Page 10: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/10.jpg)
Lister Hill National Center for Biomedical Communications 10
Uses of biomedical ontologies
Knowledge managementAnnotating data and resourcesAccessing biomedical informationMapping across biomedical ontologies
Data integration, exchange and semantic interoperabilityDecision support
Data selection and aggregationDecision supportNLP applicationsKnowledge discovery
[Bodenreider, YBMI 2008]
![Page 11: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/11.jpg)
Lister Hill National Center for Biomedical Communications 11
Terminology and translational research
CancerBasic
Research
EHRCancerPatients
NCI Thesaurus SNOMED CT
![Page 12: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/12.jpg)
Lister Hill National Center for Biomedical Communications 12
Approaches to data integration (1)
WarehousingSources to be integrated are transformed into a common format and converted to a common vocabulary
MediationLocal schema (of the sources)Global schema (in reference to which the queries are made)
[Stein, Nature Rev. Gen. 2003][Hernandez, SIGMOD Rec. 2004]
[Goble J. Biomedical Informatics 2008]
![Page 13: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/13.jpg)
Lister Hill National Center for Biomedical Communications 13
Approaches to data integration (2)
Linked dataLinks among data elementsEnable navigation by humans
[Stein, Nature Rev. Gen. 2003][Hernandez, SIGMOD Rec. 2004]
[Goble J. Biomedical Informatics 2008]
![Page 14: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/14.jpg)
Lister Hill National Center for Biomedical Communications 14
Ontologies and warehousing
RoleProvide a conceptualization of the domain
Help define the schemaInformation model vs. ontology
Provide value sets for data elementsEnable standardization and sharing of data
ExamplesAnnotations to the Gene OntologyBioWarehouseClinical information systems
http://biowarehouse.ai.sri.com/
![Page 15: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/15.jpg)
Lister Hill National Center for Biomedical Communications 15
Ontologies and mediation
RoleReference for defining the global schemaMap between local and global schemas
Query reformulationLocal-as-view vs. Global-as-view
ExamplesTAMBISBioMediatorOntoFusion
[Stevens, Bioinformatics 2000]
[Louie, AMIA 2005]
[Perez-Rey, Comput Biol Med 2006]
![Page 16: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/16.jpg)
Lister Hill National Center for Biomedical Communications 16
Ontologies and linked data
RoleExplicit conceptualization of the domainSemantic normalization of data elements
ExamplesEntrezSemantic Web mashupsBio2RDF
[http://www.ncbi.nlm.nih.gov/]
[J. Biomedical informatics 41(5) 2008]
[http://bio2rdf.org/]
![Page 17: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/17.jpg)
Lister Hill National Center for Biomedical Communications 17
Ontologies and data integration
Source of identifiers for biomedical entitiesSemantic normalizationWarehouse approaches
Source of reference relations for the global schemaMapping between local and global schemasMediator-based approaches
Source of identifiers for biomedical entitiesSemantic normalizationExplicit conceptualization of the domainLinked data approaches
![Page 18: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/18.jpg)
Lister Hill National Center for Biomedical Communications 18
Ontologies and data aggregation
Source of hierarchical relationsAggregate data into coarser categoriesAbstract away from low-frequency, fine grained data pointsIncrease powerImprove visualization
![Page 20: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/20.jpg)
Lister Hill National Center for Biomedical Communications 20
Annotating data
Gene OntologyFunctional annotation of gene productsin several dozen model organisms
Various communities use the same controlled vocabulariesEnabling comparisons across model organismsAnnotations
Assigned manually by curatorsInferred automatically (e.g., from sequence similarity)
![Page 21: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/21.jpg)
Lister Hill National Center for Biomedical Communications 21
GO Annotations for Aldh2 (mouse)
http:// www.informatics.jax.org/
![Page 22: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/22.jpg)
Lister Hill National Center for Biomedical Communications 22
GO ALD4 in Yeast
http://db.yeastgenome.org/
![Page 23: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/23.jpg)
Lister Hill National Center for Biomedical Communications 23
GO Annotations for ALDH2 (Human)
http://www.ebi.ac.uk/GOA/
![Page 24: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/24.jpg)
Lister Hill National Center for Biomedical Communications 24
Integration applications
Based on shared annotationsEnrichment analysis (within/across species)Clustering (co-clustering with gene expression data)
Based on the structure of GOClosely related annotationsSemantic similarity
Based on associations between gene products and annotationsLeveraging reasoning
[Bodenreider, PSB 2005]
[Sahoo, Medinfo 2007]
[Lord, PSB 2003]
![Page 25: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/25.jpg)
Lister Hill National Center for Biomedical Communications 25
Gene Ontology
Integration Entrez Gene + GO
gene
GO
PubMed
Gene name
OMIM
Sequence
InteractionsGlycosyltransferase
Congenital muscular dystrophy
Entrez Gene
[Sahoo, Medinfo 2007]
![Page 26: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/26.jpg)
Lister Hill National Center for Biomedical Communications 26
From glycosyltransferaseto congenital muscular dystrophy
MIM:608840 Muscular dystrophy, congenital, type 1D
GO:0008375
has_associated_phenotype
has_molecular_function
EG:9215LARGE
acetylglucosaminyl-transferase
GO:0016757glycosyltransferase
GO:0008194isa
GO:0008375 acetylglucosaminyl-transferase
GO:0016758
![Page 28: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/28.jpg)
Lister Hill National Center for Biomedical Communications 28
Cancer Biomedical Informatics Grid
US National Cancer InstituteCommon infrastructure used to share data and applications across institutions to support cancer research efforts in a grid environmentService-oriented architecture
Data and application services available on the gridSupported by ontological resources
![Page 29: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/29.jpg)
Lister Hill National Center for Biomedical Communications 29
caBIG services
caArrayMicroarray data repository
caTissueBiospecimen repository
caFE (Cancer Function Express)Annotations on microarray data
…
caTRIPCancer Translational Research Informatics PlatformIntegrates data services
![Page 30: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/30.jpg)
Lister Hill National Center for Biomedical Communications 30
Ontological resources
NCI ThesaurusReference terminology for the cancer domain~ 60,000 conceptsOWL Lite
Cancer Data Standards Repository (caDSR)Metadata repositoryUsed to bridge across UML models through Common Data ElementsLinks to concepts in ontologies
![Page 31: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/31.jpg)
Examples
Semantic Webfor Health Care and Life Sciences
http://www.w3.org/2001/sw/hcls/
![Page 32: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/32.jpg)
Lister Hill National Center for Biomedical Communications 32
Semantic Web layer cake
![Page 33: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/33.jpg)
Linked datalinkeddata.org
![Page 34: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/34.jpg)
Lister Hill National Center for Biomedical Communications 34
Linked data
![Page 35: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/35.jpg)
Lister Hill National Center for Biomedical Communications 35
Linked biomedical data[Tim Berners-Lee TED 2009 conference]http://www.w3.org/2009/Talks/0204-ted-tbl/#(1)
![Page 36: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/36.jpg)
Lister Hill National Center for Biomedical Communications 36
W3C Health Care and Life Sciences IG
![Page 37: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/37.jpg)
Lister Hill National Center for Biomedical Communications 37
Biomedical Semantic Web
IntegrationData/InformationE.g., translational research
Hypothesis generationKnowledge discovery
[Ruttenberg, BMC Bioinf. 2007]
![Page 38: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/38.jpg)
Lister Hill National Center for Biomedical Communications 38
HCLS mashup of biomedical sources
NeuronDB
BAMS
NC Annotations
Homologene
SWAN
Entrez Gene
Gene Ontology
Mammalian Phenotype
PDSPki
BrainPharm
AlzGene
Antibodies
PubChem
MeSH
Reactome
Allen Brain Atlas
Publications
http://esw.w3.org/topic/HCLS/HCLSIG_DemoHomePage_HCLSIG_Demo
![Page 39: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/39.jpg)
Lister Hill National Center for Biomedical Communications 39
Shared identifiers Example
GO
![Page 40: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/40.jpg)
Lister Hill National Center for Biomedical Communications 40
HCLS mashup NeuronDB
Protein (channels/receptors)NeurotransmittersNeuroanatomyCellCompartmentsCurrents
BAMSProteinNeuroanatomyCellsMetabolites (channels)PubMedID
NC Annotations
Genes/ProteinsProcessesCells (maybe)PubMed ID
Allen Brain Atlas
GenesBrain imagesGross anatomy -> neuroanatomy
Homologene
GenesSpeciesOrthologiesProofs
SWAN
PubMedIDHypothesisQuestionsEvidence
Genes
Entrez GeneGenesProtein
GOPubMedID
Interaction (g/p)Chromosome
C. location
GO
Molecular functionCell components
Biological processAnnotation gene
PubMedID
Mammalian Phenotype
Genes Phenotypes
DiseasePubMedID
ProteinsChemicals
Neurotransmitters
PDSPki
BrainPharmDrug
Drug effectPathological agent
PhenotypeReceptorsChannelsCell typesPubMedIDDisease
AlzGene
Gene Polymorphism
PopulationAlz Diagnosis
AntibodiesGenes Antibodies
PubChem
NameStructurePropertiesMeSH term
MeSHDrugsAnatomyPhenotypesCompoundsChemicalsPubMedIDPubChem
Reactome
Genes/proteinsInteractionsCellular locationProcesses (GO)
![Page 41: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/41.jpg)
Lister Hill National Center for Biomedical Communications 41
HCLS mashup NeuronDB
Protein (channels/receptors)NeurotransmittersNeuroanatomyCellCompartmentsCurrents
BAMSProteinNeuroanatomyCellsMetabolites (channels)PubMedID
NC Annotations
Genes/ProteinsProcessesCells (maybe)PubMed ID
Allen Brain Atlas
GenesBrain imagesGross anatomy -> neuroanatomy
Homologene
GenesSpeciesOrthologiesProofs
SWAN
PubMedIDHypothesisQuestionsEvidence
Genes
Entrez GeneGenesProtein
GOPubMedID
Interaction (g/p)Chromosome
C. location
GO
Molecular functionCell components
Biological processAnnotation gene
PubMedID
Mammalian Phenotype
GenesPhenotypes
DiseasePubMedID
ProteinsChemicals
Neurotransmitters
PDSPki
BrainPharmDrug
Drug effectPathological agent
PhenotypeReceptorsChannelsCell typesPubMedIDDisease
AlzGene
GenePolymorphism
PopulationAlz Diagnosis
AntibodiesGenesAntibodies
PubChem
NameStructurePropertiesMeSH term
MeSHDrugsAnatomyPhenotypesCompoundsChemicalsPubMedIDPubChem
Reactome
Genes/proteinsInteractionsCellular locationProcesses (GO)
![Page 42: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/42.jpg)
Lister Hill National Center for Biomedical Communications 42
HCLS mashups
Based on RDF/OWLBased on shared identifiers
“Recombinant data” (E. Neumann)
Ontologies used in some casesSupport applications (SWAN, SenseLab, etc.)
Journal of Biomedical Informaticsspecial issue on Semantic bio-mashups[J. Biomedical Informatics 41(5) 2008]
![Page 43: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/43.jpg)
Lister Hill National Center for Biomedical Communications 43
Semantic bio-mashupsBio2RDF: Towards a mashup to build bioinformatics knowledge systemsIdentifying disease-causal genes using Semantic Web-based representation of integrated genomic and phenomic knowledgeSchema driven assignment and implementation of life science identifiers (LSIDs)The SWAN biomedical discourse ontologyAn ontology-driven semantic mashup of gene and biological pathway information: Application to the domain of nicotine dependenceTowards an ontology for sharing medical images and regions of interest in neuroimagingyOWL: An ontology-driven knowledge base for yeast biologistsDynamic sub-ontology evolution for traditional Chinese medicine web ontologyOntology-centric integration and navigation of the dengue literatureInfrastructure for dynamic knowledge integration—Automated biomedical ontology extension using textual resourcesAn ontological knowledge framework for adaptive medical workflowSemi-automatic web service composition for the life sciences using the BioMoby semantic web frameworkCombining Semantic Web technologies with Multi-Agent Systems for integrated access to biological resources
[J. Biomedical Informatics 41(5) 2008]
![Page 44: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/44.jpg)
Challenging issues
![Page 45: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/45.jpg)
Lister Hill National Center for Biomedical Communications 45
Challenging issues
Bridges across ontologiesPermanent identifiers for biomedical entitiesOther issues
![Page 46: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/46.jpg)
Challenging issues
Bridges across ontologies
![Page 47: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/47.jpg)
Lister Hill National Center for Biomedical Communications 47
Trans-namespace integration
Addison Disease(D000224)
Addison's disease (363732003)
Biomedicalliterature
MeSH
Clinicalrepositories
SNOMED CT
Primary adrenocortical insufficiency(E27.1)
ICD 10
![Page 48: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/48.jpg)
Lister Hill National Center for Biomedical Communications 48
(Integrated) concept repositories
Unified Medical Language Systemhttp://umlsks.nlm.nih.govNCBO’s BioPortalhttp://www.bioontology.org/tools/portal/bioportal.htmlcaDSRhttp://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/cadsr
Open Biomedical Ontologies (OBO)http://obofoundry.org/
![Page 49: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/49.jpg)
Lister Hill National Center for Biomedical Communications 49
Integrating subdomains
Biomedicalliterature
MeSH
Genomeannotations
GOModelorganisms
NCBITaxonomy
Geneticknowledge bases
OMIM
Clinicalrepositories
SNOMED CTOthersubdomains
…
Anatomy
FMA
UMLS
![Page 50: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/50.jpg)
Lister Hill National Center for Biomedical Communications 5050
Integrating subdomains
Biomedicalliterature
Genomeannotations
Modelorganisms
Geneticknowledge bases
Clinicalrepositories
Othersubdomains
Anatomy
![Page 51: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/51.jpg)
Lister Hill National Center for Biomedical Communications 51
Trans-namespace integration
Genomeannotations
GOModelorganisms
NCBITaxonomy
Geneticknowledge bases
OMIMOther
subdomains
…
Anatomy
FMA
UMLSAddison Disease (D000224)
Addison's disease (363732003)
Biomedicalliterature
MeSH
Clinicalrepositories
SNOMED CT
UMLSC0001403
![Page 52: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/52.jpg)
Lister Hill National Center for Biomedical Communications 52
Mappings
Created manually (e.g., UMLS)PurposeDirectionality
Created automatically (e.g., BioPortal)Lexically: ambiguity, normalizationSemantically: lack of / incomplete formal definitions
Key to enabling semantic interoperabilityEnabling resource for the Semantic Web
![Page 53: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/53.jpg)
Challenging issues
Permanent identifiers for biomedical entities
![Page 54: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/54.jpg)
Lister Hill National Center for Biomedical Communications 54
Identifying biomedical entities
Multiple identifiers for the same entity in different ontologiesBarrier to data integration in general
Data annotated to different ontologies cannot “recombine”Need for mappings across ontologies
Barrier to data integration in the Semantic WebMultiple possible identifiers for the same entity
Depending on the underlying representational scheme (URI vs. LSID)Depending on who creates the URI
![Page 55: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/55.jpg)
Lister Hill National Center for Biomedical Communications 55
Possible solutions
PURL http://purl.orgOne level of indirection between developers and usersIndependence from local constraints at the developer’s end
The institution creating a resource is also responsible for minting URIs
E.g., URI for genes in Entrez Gene
Guidelines: “URI note”W3C Health Care and Life Sciences Interest Group
Shared names initiativeIdentify resources vs. entities
[http://sharedname.org/]
![Page 56: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/56.jpg)
Challenging issues
Other issues
![Page 57: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/57.jpg)
Lister Hill National Center for Biomedical Communications 57
Availability
Many ontologies are freely availableThe UMLS is freely available for research purposes
Cost-free license requiredLicensing issues can be tricky
SNOMED CT is freely available in member countries of the IHTSDO
Being freely availableIs a requirement for the Open Biomedical Ontologies (OBO)Is a de facto prerequisite for Semantic Web applications
![Page 58: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/58.jpg)
Lister Hill National Center for Biomedical Communications 58
Discoverability
Ontology repositoriesUMLS: 152 source vocabularies(biased towards healthcare applications)NCBO BioPortal: ~141ontologies(biased towards biological applications)Limited overlap between the two repositories
Need for discovery servicesMetadata for ontologies
![Page 59: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/59.jpg)
Lister Hill National Center for Biomedical Communications 59
Formalism
Several major formalismWeb Ontology Language (OWL) – NCI ThesaurusOBO format – most OBO ontologiesUMLS Rich Release Format (RRF) – UMLS, RxNorm
Conversion mechanismsOBO to OWLLexGrid (import/export to LexGrid internal format)
![Page 60: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/60.jpg)
Lister Hill National Center for Biomedical Communications 60
Ontology integration
Post hoc integration , form the bottom upUMLS approachIntegrates ontologies “as is”, including legacy ontologiesFacilitates the integration of the corresponding datasets
Coordinated development of ontologiesOBO Foundry approachEnsures consistency ab initioExcludes legacy ontologies
![Page 61: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/61.jpg)
Lister Hill National Center for Biomedical Communications 61
Quality
Quality assurance in ontologies is still imperfectly defined
Difficult to define outside a use case or applicationSeveral approaches to evaluating quality
Collaboratively, by users (Web 2.0 approach)Marginal notes enabled by BioPortal
Centrally, by expertsOBO Foundry approach
Important factors besides qualityGovernanceInstalled base / Community of practice
![Page 62: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/62.jpg)
Lister Hill National Center for Biomedical Communications 62
Conclusions
Ontologies are enabling resources for data integrationStandardization works
Grass roots effort (GO)Regulatory context (ICD 9-CM)
Bridging across resources is crucialOntology integration resources / strategies(UMLS, BioPortal / OBO Foundry)
Massive amounts of imperfect data integrated with rough methods might still be useful
![Page 63: Kno.e · 27-05-2009 · Bio2RDF: Towards a mashup to build bioinformatics knowledge systems Identifying disease-causal genes using Semantic Web-based representation of integrated](https://reader035.vdocuments.net/reader035/viewer/2022070622/5e47822b57c1f609fe1725e1/html5/thumbnails/63.jpg)
MedicalOntologyResearch
Olivier Bodenreider
Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA
Contact:Web: