rna strat: rna secondary structure analysis toolkitcchauve/publications/ismb08-d31-poster.pdf · a...

A STEM/STEM-LOOP DECOMPOSITION BASED HEURISTIC

RNA secondary structures are decomposed into stems and stem-loops, that are then compared using an exact algorithm for the conservative edit distance [Guignon et al., 2005], a variant of the general distance of [Jiang et al., 2002].

Then, these hairpins pairwise comparison are used in a Smith-Waterman based heuristic to produce a distance and alignment between the two complete structures.

SEARCH FOR STRUCTURAL HOMOLOGS

For example, structured RNA gene detected by a high-throughput computational analysis.

Structures database

ALGORITHMIC MODELEDIT DISTANCE

The edit distance algorithm is the following problem: - given

● two RNA secondary structures,● a set of allowed edit operations and● a cost for each possible operation,

- compute● an alignment of minimum cost between the two

structures.RNAStrAT uses the edit distance model defined in [Jiang et al. 2002] that comports:● single base edit operations (substitution/insertion/deletion),● base pairs operations such as insertion/deletion, creation/opening or alteration of a hydrogene bond.

This model was introduced by [Jiang et al., 2002]. Computing the distance is an NP-hard problem, but several less general version of this problem can be solved exactly and are used in widely in RNA secondary strcutures comparison tools such as RNAForrester [Höchsmann et al., 2004].

ALIGNMENT OFSECONDARY STRUCTURES

SUMMARYRNAStrAT is a web server dedicated to the comparison of sets of RNA secondary structures. This server offers tools to align pairs of RNA secondary structures and to search for structural homologs in a database of RNA secondary structures (based on the RFAM). The alignment and search are based on an edit distance algorithm that considers a wide range of edit operations defined in [Jiang et al., 2002]. Tools for the vizualisation of secondary structures and structures alignments are also available. Up to date RNA StrAT is the only server offering all these features (general RNA edit model, RFAM database search, rendering) together.

Availability: http://www-lbit.iro.umontreal.ca/rnastrat/

Contact: guignonv@yahoo.fr

ADDITIONAL FEATURES AND DEVELOPMENT

The database of RNA secondary structures

Users can access to structure information including links to its RNA family (in the Rfam classification), its organism taxonomy (EMBL), its sequence (EMBL). Structures stored in our database are extracted from the Rfam seed alignments and for each RNA gene, its specific secondary structure is obtained from both its sequence and the family consensus structure.

Database search improvementsIn order to speed-up the database search the search engines first analyze the structural characteristics of the query structure to select a group of candidates in the database that share similar characteristics close to the query ones, eliminating at the same time irrelevant structures. Then, the query structure is compared to these candidates to find which ones have the best similarity scores. The user can modify the parameters that define the candidates.

Structures and alignment rendering

High structuralsimilarity score

Query structure

Valentin Guignon1, Cedric Chauve2 and Sylvie Hamel11. DIRO, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal, QC, H3C 3J7, Canada.2. Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada. CGL and LaCIM, Université du Québec à Montréal, Montréal, QC, Canada.

RNA StrAT: RNA Secondary Structure Analysis Toolkit

Figure 1. Structural alignment between two 5S rRNA. Some edit operations are displayed on the structures using arrows.

Delftia acidovoransEscherichia coli

5 ’3 '

1 2 0UGCCUGGCGG C CG

UAG C G C G G U G

G U C CC A C C U G A

C C C C AUGC

CGAACUCAG

AAACGCCGU

CGAUGGUA

GUGUGGGGUCU

CCCCAUGC

GAGAGUAGG

ACUGCCAGGCAU

5 '3 '

UGCCUGAUGA C CA

UAG C A A G U U G

G U A CC A C U C C U

U C C C AUCC

CGAACAGGA

AAACGACUU

CGAUGAUA

GUGCGGG

UUCCCGUGU

GAAAGUAGG

AUCGUCAGGCNN

5 01 0 0

>Escherichia coli, V00336UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU((((((((((.....((((((((....(((((((.............))))..)))...)))))).)).(((((((..((((((((...))))))))..)))))))...)))))))))).>Delftia acidovoransUGCCUGAUGACCAUAGCAAGUUGGUACCACUCCUUCCCAUCCCGAACAGGACAGUGAAACGACUUUGCGCCGAUGAUAGUGCGGG-U-U-CCCGUGUGAAAGUAGGUCAUCGUCAGGCNN.(((((((((.....((((((((....(((((((.............))))..)))...)))))).)).((.((....((((((.-.-.-.))))))....)).))...)))))))))..

pair match

base substitution base match

completion/altering

pair substitutionpair half-match

pair deletion/insertion

pair opening/creation

base deletion/insertion

Figure 2. Given a query structure and a database of structures, structural homologs can be found using structural similarity scores.

C AG GG C

C GG GG C

CG GG C

base substitution

deletion

base insertion

G UG C

G CG C

G UG C

C AGGG CC

GG AG C

paired basesubstitution

pair substitution

pair deletion

pair insertion

G CG C

G UG CG C

pair creation

pair opening

pair completion

pair altering

Figure 3. Edit operations.

A UAG G G CG G AG G G A A G CUCAUC

AGUGGGGCCACGAGC

UGAGUGCGUCCUG U C A CU C

C C C A UGUC C C UUGGGAAGGUCUGAGACUA

GGGCCA

CGGCCC

UAACAGGGCUCUC

GC U UCG GGG AG GU

GAGUUC

CCAGAG A

A C G GGG CUCCG

CG CGAGGUCAG A CU

GGGCAGG AG A UG C

A C CCCG CCC

U UCGGGGAGGGGCCCGGCGGAUGC

CUCCUUUGCCGGAGC

UUGGAACAGACUCACG G C C A G C G A A

GUGAGUUCAAUGGCUGAG GU

GAGGUACCCCG CAGGGG

ACCUCAU A A CCCAAUUCAGACCAC

UCUCCUCCGCCCAUU

5´3´

P3aP3b P4

P9P10/11

G AG C

G UC C U G

UC A C U C C A C U CCCAUGUCCCU GG

CCCUA A CAG

GGCU CU

GAG C UUCGGGG

AG GUCAG A CU

G GA G

A UGCC

CU U C G

GGG AGGGG

GGCG GA

UGCCUCCGUGAGUUCC

CAGAGAA

C G G G GCUCCGC GC

GA UUGCCGGAGCUU

GGAACAGACUC

ACGGCC

GGCCCU

CAUGAG

AU AGG GCGG

AGGGUCCU

GUGAGGU

A CCCCG C A

GGGGACCUCAU

decomposition

Figure 4. Stem/stem-loop decomposition of an Rnase P structure.

Figure 5. Edit distance computation between 2 Rnase P structures.

Structure rendering(see figure 7)

Figure 6. Database browsing features.

Figure 7. Structure online rendering (base on Vienna RNA Package): the rendering form (on the left) enables the user to display structures in various ways (on the right) and browse the structure (middle).

Figure 8. The rendering engine enables to compare two aligned structures (on the left) and see which bases changed from a structure to the other. The characteristics of a structure compared to a set of structures can also be rendered (on the right). Each base of the query structure is displayed with a pie chart that shows how often the base has been kept, replaced by an other one or deleted.

REFERENCESGriffiths-Jones S., Bateman A., Marshall M., Khanna A., Eddy S.R. RFam: an RNA family database, Nucleic Acids Research, 2003, 31, 1, 439-441.

Jiang T., Lin G., Ma B., Zhang K. A general edit distance between RNA structures. J. Comput. Biol., 9(2):371–388. 2002.

Guignon V., Chauve C., Hamel S. Distance d'édition entre tige-boucles, JOBIM 2005, 2005, poster 82.

Smith T.F., Waterman M.S. Identification of Common Molecular Subsequences, Journal of Molecular Biology 147: 195–197. doi:10.1016/0022-2836(81)90087-5, 1981.

Höchsmann M., Voss B., Giegerich R. Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 2004, pp53-62.

Vienna RNA Package, http://www.tbi.univie.ac.at/~ivo/RNA

RNase P SM-A12(14)

Stem/stem-loop edit distance Global edit distance

RNase P SM-A12(14)

RNase P SM-A18(31) RNase P SM-A12(14)

gaggaaaguccgggcUCC U

ACAGGG

CGCCAGGUAAC

G CCUGGGGG

A G C C C AC GGAA A G U

GCCACAGAAAA

U AU A C

CGCCAGCUU CG

GCUGGU AAG

G G U G A AAUG G U G C GG

U AAGAGCGCACC

GC G C G A C U G G

GGCUUGC

GGCACGGUAAACCCC G C C C G G A G C A A G A C C

AAA U AG

G G G A G CAUG U

C C G UCG U G

ACGGGCUCCCGGGUA

GGUUGCUUGAG G U G G C C G G U

GACGGCUAUCCCAGAUGAAUGGUUGUCG A UG ac

agaacccggcuuac

180 200

gaggaaaguccgggcUCCA

AGCGCGGU

GCCGGAUAAC

G UCCGGCGG

A C C U C AG GGA

AA G UGC

CACAGAAAG

C A A A CCGCCCUCGAGGCCGA AA

GGCUUC GCGGAGGGU AAGG G U G A AA

GG G U G C GG

U AAGAGCGCACC

GC G U C U U U G G

CAACAAA

GGCGGCAAGGCAAAC

CCC A C C G G G A C C

AAAU AG

G G G C U GCACG

GACGAGAG A

UCGUCCAG G U C U G

UUUCCAGACCCGCGGCCC

GGGUUGGUU

GCAAGAG G C G U C U C G C

AAGAGGCGUCCCAGAUGAAUGGCCAUC A C

CUCGCAG C AA

UGCGAGG A a c

agaacccggcuua

160180

240260

A ACG A G1

rna strat: rna secondary structure analysis toolkitcchauve/publications/ismb08-d31-poster.pdf · a...

Documents

single-cell rna sequencing of human embryonic stem cell...

rna bioinformatics genes and secondary structure

rna secondary structure biological functions & … secondary...

cotranscriptional kinetic folding of rna secondary...

rna structure prediction rfam – rna structures database...

rna stem structure governs coupling of dicing and gene...

the rna secondary structure dependence of rna-protein

turbo-decoding of rna secondary...

rna: secondary structure prediction and analysis

1 rna-seq of rice yellow stem borer, scirpophaga ... · 1...

rna secondary structure prediction. 16s rrna rna secondary...

rna secondary structure what is rna? definition of rna...

applied bioinformatics week 11. topics protein secondary...

inferring secondary structure from rna alignments and...

tutorial 11 rna structure prediction. introduction – rna...

non-coding rna gene finding problems. outline introduction...

rna secondary structure prediction

folding and finding rna secondary...

rna secondary structure prediction tool

improving free energy functions for rna folding rna...