rkb – a semantic knowledge base for rna (rna ontology consortium meeting)
DESCRIPTION
Increasingly sophisticated knowledge about RNA structure and function requires an inclusive knowledge representation that facilitates the integration of independently-generated information arising from such efforts as genome sequencing projects, microarray analyses, structure determination and RNA SELEX experiments. While RNAML, an XML-based representation, has been proposed as an exchange format for a select subset of information, it lacks machine-understandable semantics that make it arbitrarily user-extensible, as is the case for formal logic based languages. Here, we describe an RNA knowledge base (RKB) for structure-based knowledge using RDF/OWL Semantic Web technologies. RKB contains basic terminology for nucleic acid composi-tion along with context/model-specific representation of structural features such as sugar conformations, base pairings and base stackings. RKB is populated with RNA PDB entries and MC-Annotate structural annotation. The use of semantic web technologies addresses the reality of diverse interests of the RNA Ontology Consortium and supports knowledge discovery over independently-published RNA knowledge.TRANSCRIPT
![Page 1: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/1.jpg)
RKB – A Semantic Knowledge Base for RNA
Michel Dumontier 1, José Cruz-Toledo 1
Marc Parisien 2, Francois Major 2
1 Carleton University2 Université de Montreal
![Page 2: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/2.jpg)
2Carleton University -- Dumontier Lab dumontierlab.com
Objectives
i. To represent biochemistry of nucleic acids and their structural characteristics including base pairing/stacking
ii. Represent context specific knowledge
iii. Capture the structural annotation generated by MC-Annotate
5/25/2009
![Page 3: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/3.jpg)
3Carleton University -- Dumontier Lab dumontierlab.com
Guided design
• Modeling with Upper Level Ontologies– interoperability and semantic coherency– New Upper Level Ontology (NULO)
• distinguishes objects, qualities, roles, processes and spatial regions
• Based on BFO/RO, but for OWL
5/25/2009
![Page 4: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/4.jpg)
4Carleton University -- Dumontier Lab dumontierlab.com
• Objects– Occupy space
• Nucleic acids, nucleotides, riboses and phosphates
• Qualities– Intrinsic categorical or numeric valued property
• Nucleotide bears the quality of conformation
• Roles– Defined by extrinsic interactions
• A C3’ atom may hold the exo role during some sugar puckering
• Processes– Entities that extend in time
• structure determination, an interaction
5/25/2009
Biological Modeling
![Page 5: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/5.jpg)
5Carleton University -- Dumontier Lab dumontierlab.com
Contextual Modeling of Nucleic Acids
• Base stacking varies in different XRD/NMR models• Need to know in which model that info is found• We want to set the stage for representing simulation.
5/25/2009
![Page 6: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/6.jpg)
6Carleton University -- Dumontier Lab dumontierlab.com
RKB populated with PDB, MC-Annotate
• The ontology population involved 3 steps:
i. Assigning names
ii. Asserting class membership
iii. Assigning relations between entities
• The following naming convention was used:– Objects:
• Polymer: PDBID_cCHAIN• Residue: PDBID_cCHAIN_rRESIDUE• Atom: PDBID_cCHAIN_rRESIDUE_aAtom
– Quality/Roles• PDBID_mMODEL_cCHAIN_rRESIDUE_type
– Processes• Structure determination: PDBID_mMODEL• Interaction: PDBID_mMODEL_PROCESSTYPE_PARTICIPANT
5/25/2009
![Page 7: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/7.jpg)
7Carleton University -- Dumontier Lab dumontierlab.com
Support for Leontis-Westhof Nomenclature
5/25/2009
• The RKB incorporates LW nomenclature • Describes the three edges for H-bonding
interactions in purines (Y) and pyrimidines (R)• Atom composition:
i. Watson-Crick Edge:• A(N6)/G(O6), R(N1), A(C2)/G(N2),
U(O4)/C(N4), Y(N3) and Y(O2)
ii. Hoogsteen Edge (CH edge for R):• A(N6)/G(O6), R(N7), U(O4)/C(N4) and
Y(C5)
iii. Sugar Edge:• A(C2)/G(N2), R(N3), Y(O2) and O2’
• cis and trans orientations • relative orientations of the glycosidic bond
between the sugar and the PO4 group
![Page 8: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/8.jpg)
8
Support for LW+ Nomenclature
• Extension incorporates faces to each edge:
– WC edge:• Wh, Ww and Ws faces
– Hoogsteen Edge:• C8(Y), Hh, Hw and Bh
– Sugar Edge:• Bs, Ss(Y), Sw and O2’
• The Bh and Bs faces involve the Hoogsteen side amino/keto group and the sugar side amino/ keto group respectively.
• The C8 face was introduced for the C8-H8 donor group in purines
5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com
![Page 9: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/9.jpg)
9
Describing Base Pairs
• Base pairs composed of interactions with the edges or faces of the interacting bases
• Role chains capture additional knowledge:
Objects that participate in sub-processes (face interactions) are also participants of the process whole (base pair)
hasPart ◦ hasParticipant -> hasParticipant
Objects are involved in processes when their qualities are
isBearerOf ◦ isParticipantIn -> isParticipantIn
5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com
![Page 10: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/10.jpg)
10Carleton University :: Dumontier Lab :: dumontierlab.com
The RKB is compatible with both the LW and the Saenger nomenclature for base pairs
• The semantics of the RKB enables the usage of consistent bp naming schemes
• The AA BP in model 4 of PDB:1B36 can be classified as the being member of the following classes:– Saenger type II– LW Trans Hoogsteen/Hoogsteen (8)
5/25/2009
A A
NucleotideBasePairand ParallelBasePairand TransBasePairand HoogsteenHoogsteenBasePairand hasAgent exactly 2 AMP
![Page 11: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/11.jpg)
11Carleton University -- Dumontier Lab dumontierlab.com
Sugar Puckering
• The ribose ring presents two distinct puckering modes, envelope and twist
• The classification into either geometry is dependent on the relative position of the carbon atoms of the ribose to its C5’ atom
• Carbon atoms in a ribose thus bear either the endo or exo role with respect to the plane formed by the other atoms
5/25/2009
![Page 12: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/12.jpg)
12
Sugar Puckering (cont’d)
Our implementation of situational modeling assures that objects are represented by a single entity throughout their lifetime, thus avoiding the need to create multiple distinct instances of the same object in each particular spatial-temporal context with different attributes
5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com
![Page 13: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/13.jpg)
13
RKB is SPARQL accessible
• SPARQL is a graph query language• Loaded instantiated ontology into Virtuoso 6
• SPARQL endpoint– http://codemonkey.dumontierlab.com/sparql/
• Specify Graphs to restrict search– http://semanticscience.org/rkb/mcannotate/pdb/dna– http://semanticscience.org/rkb/mcannotate/pdb/rna
5/25/2009Carleton University :: Dumontier Lab ::
dumontierlab.com
![Page 14: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/14.jpg)
14
Query 1: Find all face interactions (model 1 of PDB:1B36)
PREFIX ss: <http://semanticscience.org/>
select distinct ?faceInteraction where {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .}
5/25/2009Carleton University :: Dumontier Lab ::
dumontierlab.com
Nucleotide base pairs are composed of one or more face interactions. Where known, such as in the MC-Annotate results, we can retrieve all 18 instances of this that satisfy this query.
![Page 15: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/15.jpg)
Carleton University -- Dumontier Lab dumontierlab.com 155/25/2009
See results : http://tinyurl.com/porxdb
![Page 16: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/16.jpg)
16Carleton University :: Dumontier Lab :: dumontierlab.com
Query 2: Find all C8 mediated base pairs (model 1 of PDB:1B36)
PREFIX ss: <http://semanticscience.org/>SELECT DISTINCT ?faceInteraction ?residue ?hasC8Face where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?C8Face ss:isAgentIn ?faceInteraction . ?C8Face rdf:type ss:C8Face . ?residue ss:hasQuality ?C8Face
}
Results: http://tinyurl.com/r7b5e4
5/25/2009
Face interactions are mediated by the faces of bases. Nucleotides and their face qualities are related by the hasQuality relation, whereas faces are agents in the face interaction, and are related by the hasAgent relation.
![Page 17: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/17.jpg)
17Carleton University :: Dumontier Lab :: dumontierlab.com
Query 3: Find base pairs involving a GMP sugar-sugar face (model 1 of PDB:1B36)
PREFIX ss: <http://semanticscience.org/>
SELECT distinct ?faceInteraction ?residue ?hasSSFace WHERE {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .?hasSSFace rdf:type ss:SugarSugarFace .?hasSSFace ss:isAgentIn ?faceInteraction .?residue ss:hasQuality ?hasSSFace .?residue rdf:type ss:GMP}
Results found at: http://tinyurl.com/qpup8z
5/25/2009
This query builds on Query 2, in that it requires a Ss face to be on an AMP that is participating in a base pair. Two GMPs are found to have this particular face participating with other nucleotides in base pairs in this particular structure
![Page 18: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/18.jpg)
18Carleton University :: Dumontier Lab :: dumontierlab.com
Query 4: Find Hoogsteen – O2’ face interactions (model 1 of PDB:1B36)
PREFIX ss: <http://semanticscience.org/>SELECT distinct ?faceInteraction ?residue1 ?residue2 ?hasHhFace ?hasO2pFace where {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .?hasHhFace rdf:type ss:HoogsteenHoogsteenFace .?hasHhFace ss:isAgentIn ?faceInteraction .?hasO2pFace rdf:type ss:O2pFace .?hasO2pFace ss:isAgentIn ?faceInteraction .?residue1 ss:hasQuality ?hasHhFace .?residue2 ss:hasQuality ?hasO2pFace}
Results found at: http://tinyurl.com/oo4fp8
5/25/2009
LW+ nomenclature more detailed for base interactions. The result of this query describes a single base pair in this structure.
![Page 19: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/19.jpg)
19Carleton University -- Dumontier Lab dumontierlab.com
Future Directions
• Specify Saenger nomenclature • Map other structural annotator output (e.g. 3DNA)• Extend structural knowledge with 6 backbone angles
– range restrictions on classes
• SWRL / DL-safe rules or SPARQL query required to specify cyclic motifs
• Publish as part of Bio2RDF network
5/25/2009
![Page 20: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/20.jpg)
20Carleton University -- Dumontier Lab dumontierlab.com
RKB Availability
• Creative Commons License.• Google Code Project:
– http://semanticscience.org
• Instructions: http://code.google.com/p/semanticscience/wiki/RKBDownload
5/25/2009
![Page 21: RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)](https://reader036.vdocuments.net/reader036/viewer/2022070315/554e900fb4c905fc368b4b9f/html5/thumbnails/21.jpg)
21Carleton University -- Dumontier Lab dumontierlab.com
References
• Dumontier, M., et al. (2009). RKB: A Semantic Web Knowledge Base for RNA, Accepted in Bio-Ontologies 2009, Stockholm, Sweden
• Smith, B., et al. (2005). Relations in biomedical ontologies. Genome Biol, 6(5): p. R46
• Leontis, N. B. and E. Westhof (2001). Geometric nomenclature and classification of RNA base pairs. RNA, 7(4): 499-512.
• Lemieux, S. and F. Major. (2002). RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire. Nucleic Acids Res, 30(19): p. 4250-63.
• Major, F., Thibault, P., Computer Modeling of RNA Three-Dimensional Structures, in Encyclopedia of Molecular Cell Biology and Molecular Medicine, R.A. Meyers, Editor. 2005, Wiley-VCH Verlag GmbH & Co.: Weinheim. p. 605-636.
5/25/2009