rdf based on integration of pathway database and gene ontology

Download RDF based on Integration of Pathway Database and Gene Ontology

If you can't read please download the document

Upload: ina

Post on 09-Jan-2016

41 views

Category:

Documents


1 download

DESCRIPTION

RDF based on Integration of Pathway Database and Gene Ontology. SNU OOPSLA LAB. 2005 DongHyuk Im. Contents. Introduction Pathway Database Enzyme Database Gene Ontology Related Works Our Approach Supporting Function Data Transformation Integration of KEGG, Enzyme, Gene Ontology - PowerPoint PPT Presentation

TRANSCRIPT

  • RDF based on Integration of Pathway Database and Gene OntologySNU OOPSLA LAB.

    2005DongHyuk Im

  • ContentsIntroductionPathway DatabaseEnzyme DatabaseGene OntologyRelated WorksOur ApproachSupporting FunctionData Transformation Integration of KEGG, Enzyme, Gene OntologyQuerying using SeRQL

  • Pathway?Most chemical reaction mechanisms are translated from a compound(substrate) to a compound(product) by enzyme acting Importanceto comparison and analyze pathways in order to understand the process of creating compounds and the evolutive relevance between organismsDrug Discovery

  • PathwayMap : Glycolysis / Gluconeogenesis Map : Aquifex aeolicus

  • Enzyme DatabaseEC numberRecommended nameAlternative names(if any)Catalytic activityCofactors (if any)Pointers to the SWISS-PORT entrie(s) that correspond to the enzyme (if any)Pointers to disease(s) associated with a deficiency of the enzyme (if any)

  • Enzyme HierarchyFour levelsEC numberEx) 1.1.1.1 is a member of the top level group [1]The leftmost number identifies the highest level[2.4.2.3] [2.4.2.4](sibling) : similar reactions in pathway

    [*][1][2][3][2.1][2.2][2.3][2.2.1][2.2.2][2.2.3][2.2.2.1][2.2.2.2][2.2.2.3]

  • Gene Ontology

  • KEGG

  • KEGG To computerize all aspects of cellular functions in terms of the pathway of interacting molecules or genesTo maintain gene catalogs for all organisms and link each gene product to a pathway componentTo organize a database of all chemical compounds in the cell and link each compound to a pathway componentTo develop computational technologies for pathway comparison, reconstruction, and analysis

  • Why RDF Integration?Pathway data model : DAGRDF is a good model for representing pathwayRDF data model : DAGNeed integration of multiple knowledge sources available from internet : one of the major problems in biologistsRDF is a good model for same standardEnzyme, GO : hierarchy structureRDF is a good model for representing hierarchy structure GO annotation is importantEnzymes(proteins) in certain pathway need GO annotation

  • Related WorksKEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res. YeastHub: a semantic web case for integrating data in the life science domain, 2005, BioinformaticsLIGAND: database of chemical compounds and reactions in biological pathways, 2002, Nucleic Acids Res.Gene Ontology: tool for the unification biology, the Gene Ontology Consortium, 2000, Nature Genetics.

  • Our Systems SupportingKEGG Search compoundPath predictionSearch EnzymeOur systems function to addIntegration Query (pathway+enzyme+GO)Relaxation Query using GO hierarchySearching pathway using enzyme information

  • Search CompoundsCompound : C00668target

  • Pathway Prediction ToolcompoundRelaxation query using enzyme hierarchy

  • Search EnzymeEnzyme : 5.3.1.9

  • From Pathway to Gene OntologySelect enzyme

  • Data Translation for IntegrationKGML DataXSLTKEGG RDF DataEnzyme RDF DataGO RDF DataGENOS StorageAdding GO IDXSLT : http://www.w3.org/2005/02/13-KEGG/

  • KEGG RDF Data(1/2)

    Gene entryEnzyme entryCompound entryNo information

  • KEGG RDF Data(2/2)

    RelationReaction

  • How to Process KEGG PathwayProblemGENOS(Sesame) does not support multiple graphKEGG data consists of multiple documentsEx) map00010.rdf, aae00010.rdf SolutionUsing namespace, we can distinguish mapsWhen Storing pathway data, pathways map name is added as a namespace in resource table of GENOS

  • Processing Pathway Data

    ..

    conflictresourcestable of GENOStriples tableof GENOS

    IDNameSpaceLocalname12Glycolysis/3aae#00010_14aq_18656aae#00020_178map#00010_19.

    SubjectPredicateObject368

  • Integrating DatabasesEnzyme numberGO ID

  • Relaxation Querying using SeRQLE1.*E1.*C1C2E1SELECT C1,C2 FROM Path_EXP WHERE E1 LIKE 1.*" Dewey orderEx. 1.1 and 1.2 are childrens of 1use PrefixSeRQLsubclassofsubclassof

  • Considering Performance

    aae:aq_018path:aae03010aae:aq_020path:aae03010aae:aq_021path:aae00400....eco:b1236path:eco00052eco:b1236path:eco00500eco:b1236path:eco00520.KEGG : Pathway ListGenesMapusing genes_index

  • ScheduleImplementation (~11/30)Integrated DatabasesQuery Processor for pathwaySimple UI (Web :JSP)Complete Paper (~12/10)