rna strat: rna secondary structure analysis toolkitcchauve/publications/ismb08-d31-poster.pdf · a...
Post on 29-May-2020
11 Views
Preview:
TRANSCRIPT
A STEM/STEM-LOOP DECOMPOSITION BASED HEURISTIC
RNA secondary structures are decomposed into stems and stem-loops, that are then compared using an exact algorithm for the conservative edit distance [Guignon et al., 2005], a variant of the general distance of [Jiang et al., 2002].
Then, these hairpins pairwise comparison are used in a Smith-Waterman based heuristic to produce a distance and alignment between the two complete structures.
SEARCH FOR STRUCTURAL HOMOLOGS
For example, structured RNA gene detected by a high-throughput computational analysis.
Structures database
ALGORITHMIC MODELEDIT DISTANCE
The edit distance algorithm is the following problem: - given
● two RNA secondary structures,● a set of allowed edit operations and● a cost for each possible operation,
- compute● an alignment of minimum cost between the two
structures.RNAStrAT uses the edit distance model defined in [Jiang et al. 2002] that comports:● single base edit operations (substitution/insertion/deletion),● base pairs operations such as insertion/deletion, creation/opening or alteration of a hydrogene bond.
This model was introduced by [Jiang et al., 2002]. Computing the distance is an NP-hard problem, but several less general version of this problem can be solved exactly and are used in widely in RNA secondary strcutures comparison tools such as RNAForrester [Höchsmann et al., 2004].
ALIGNMENT OFSECONDARY STRUCTURES
SUMMARYRNAStrAT is a web server dedicated to the comparison of sets of RNA secondary structures. This server offers tools to align pairs of RNA secondary structures and to search for structural homologs in a database of RNA secondary structures (based on the RFAM). The alignment and search are based on an edit distance algorithm that considers a wide range of edit operations defined in [Jiang et al., 2002]. Tools for the vizualisation of secondary structures and structures alignments are also available. Up to date RNA StrAT is the only server offering all these features (general RNA edit model, RFAM database search, rendering) together.
Availability: http://www-lbit.iro.umontreal.ca/rnastrat/
Contact: guignonv@yahoo.fr
ADDITIONAL FEATURES AND DEVELOPMENT
The database of RNA secondary structures
Users can access to structure information including links to its RNA family (in the Rfam classification), its organism taxonomy (EMBL), its sequence (EMBL). Structures stored in our database are extracted from the Rfam seed alignments and for each RNA gene, its specific secondary structure is obtained from both its sequence and the family consensus structure.
Database search improvementsIn order to speed-up the database search the search engines first analyze the structural characteristics of the query structure to select a group of candidates in the database that share similar characteristics close to the query ones, eliminating at the same time irrelevant structures. Then, the query structure is compared to these candidates to find which ones have the best similarity scores. The user can modify the parameters that define the candidates.
Structures and alignment rendering
High structuralsimilarity score
Query structure
Valentin Guignon1, Cedric Chauve2 and Sylvie Hamel11. DIRO, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal, QC, H3C 3J7, Canada.2. Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada. CGL and LaCIM, Université du Québec à Montréal, Montréal, QC, Canada.
RNA StrAT: RNA Secondary Structure Analysis Toolkit
Figure 1. Structural alignment between two 5S rRNA. Some edit operations are displayed on the structures using arrows.
Delftia acidovoransEscherichia coli
5 ’3 '
5 0
1 0 0
1 0
1 2 0UGCCUGGCGG C CG
UAG C G C G G U G
G U C CC A C C U G A
C C C C AUGC
CGAACUCAG
AAGUG
AAACGCCGU
AGCGC
CGAUGGUA
GUGUGGGGUCU
CCCCAUGC
GAGAGUAGG
G
A
ACUGCCAGGCAU
5 '3 '
UGCCUGAUGA C CA
UAG C A A G U U G
G U A CC A C U C C U
U C C C AUCC
CGAACAGGA
CAGUG
AAACGACUU
UGCGC
CGAUGAUA
GUGCGGG
UUCCCGUGU
GAAAGUAGG
U
C
AUCGUCAGGCNN
1 0
5 01 0 0
1 10
>Escherichia coli, V00336UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU((((((((((.....((((((((....(((((((.............))))..)))...)))))).)).(((((((..((((((((...))))))))..)))))))...)))))))))).>Delftia acidovoransUGCCUGAUGACCAUAGCAAGUUGGUACCACUCCUUCCCAUCCCGAACAGGACAGUGAAACGACUUUGCGCCGAUGAUAGUGCGGG-U-U-CCCGUGUGAAAGUAGGUCAUCGUCAGGCNN.(((((((((.....((((((((....(((((((.............))))..)))...)))))).)).((.((....((((((.-.-.-.))))))....)).))...)))))))))..
pair match
base substitution base match
completion/altering
pair substitutionpair half-match
pair deletion/insertion
pair opening/creation
base deletion/insertion
Figure 2. Given a query structure and a database of structures, structural homologs can be found using structural similarity scores.
C AG GG C
C GG GG C
CG GG C
C AGG
G CG
base substitution
base
deletion
base insertion
CG AG
G UG C
CG AG
G CG C
CA AG
G UG C
C AGGG CC
U
GG AG C
G CU
paired basesubstitution
pair substitution
pair deletion
pair insertion
CG UG
G CG C
G
CG UG
G C
CA
G
G UG CG C
CGCUG
G CG
pair creation
pair opening
pair completion
pair altering
Figure 3. Edit operations.
A UAG G G CG G AG G G A A G CUCAUC
AGUGGGGCCACGAGC
UGAGUGCGUCCUG U C A CU C
CAC U
C C C A UGUC C C UUGGGAAGGUCUGAGACUA
GGGCCA
GAGG
CGGCCC
UAACAGGGCUCUC
CCUGA
GC U UCG GGG AG GU
GAGUUC
CCAGAG A
A C G GGG CUCCG
CG CGAGGUCAG A CU
GGGCAGG AG A UG C
CGUGG
A C CCCG CCC
U UCGGGGAGGGGCCCGGCGGAUGC
CUCCUUUGCCGGAGC
UUGGAACAGACUCACG G C C A G C G A A
GUGAGUUCAAUGGCUGAG GU
GAGGUACCCCG CAGGGG
ACCUCAU A A CCCAAUUCAGACCAC
UCUCCUCCGCCCAUU
5´3´
UP1
P2
P3aP3b P4
P7
P8
P9P10/11
P12
P19
GGG G
CCAC
G AG C
U GA
GUGC
G UC C U G
UC A C U C C A C U CCCAUGUCCCU GG
CCCUA A CAG
GGCU CU
CCCU
GAG C UUCGGGG
AG GUCAG A CU
GGGC
G GA G
A UGCC
GUGG
A CCC
CGCC
CU U C G
GGG AGGGG
CCC
GGCG GA
UGCCUCCGUGAGUUCC
CAGAGAA
C G G G GCUCCGC GC
GA UUGCCGGAGCUU
GGAACAGACUC
ACGGCC
GGCCCU
CAUGAG
AU AGG GCGG
AGGGUCCU
CCGC
CCAU
GUGAGGU
A CCCCG C A
GGGGACCUCAU
decomposition
Figure 4. Stem/stem-loop decomposition of an Rnase P structure.
Figure 5. Edit distance computation between 2 Rnase P structures.
Structure rendering(see figure 7)
Figure 6. Database browsing features.
Figure 7. Structure online rendering (base on Vienna RNA Package): the rendering form (on the left) enables the user to display structures in various ways (on the right) and browse the structure (middle).
Figure 8. The rendering engine enables to compare two aligned structures (on the left) and see which bases changed from a structure to the other. The characteristics of a structure compared to a set of structures can also be rendered (on the right). Each base of the query structure is displayed with a pie chart that shows how often the base has been kept, replaced by an other one or deleted.
REFERENCESGriffiths-Jones S., Bateman A., Marshall M., Khanna A., Eddy S.R. RFam: an RNA family database, Nucleic Acids Research, 2003, 31, 1, 439-441.
Jiang T., Lin G., Ma B., Zhang K. A general edit distance between RNA structures. J. Comput. Biol., 9(2):371–388. 2002.
Guignon V., Chauve C., Hamel S. Distance d'édition entre tige-boucles, JOBIM 2005, 2005, poster 82.
Smith T.F., Waterman M.S. Identification of Common Molecular Subsequences, Journal of Molecular Biology 147: 195–197. doi:10.1016/0022-2836(81)90087-5, 1981.
Höchsmann M., Voss B., Giegerich R. Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 2004, pp53-62.
Vienna RNA Package, http://www.tbi.univie.ac.at/~ivo/RNA
RNase P SM-A12(14)
RN
ase
P S
M-A
18(3
1)
Stem/stem-loop edit distance Global edit distance
RNase P SM-A12(14)
RN
ase
P S
M-A
18(3
1)
RNase P SM-A18(31) RNase P SM-A12(14)
gaggaaaguccgggcUCC U
UCGG
ACAGGG
CGCCAGGUAAC
G CCUGGGGG
GCGUG
A G C C C AC GGAA A G U
GCCACAGAAAA
U AU A C
CGCCAGCUU CG
GCUGGU AAG
G G U G A AAUG G U G C GG
U AAGAGCGCACC
GC G C G A C U G G
CAAC
GGCUUGC
GGCACGGUAAACCCC G C C C G G A G C A A G A C C
AAA U AG
G G G A G CAUG U
C C G UCG U G
UCCGA
ACGGGCUCCCGGGUA
GGUUGCUUGAG G U G G C C G G U
GACGGCUAUCCCAGAUGAAUGGUUGUCG A UG ac
agaacccggcuuac
1
2040
60
80
100
120
140
160
180 200
220
240
260
280
gaggaaaguccgggcUCCA
UGGA
AGCGCGGU
GCCGGAUAAC
G UCCGGCGG
GGGCG
A C C U C AG GGA
AA G UGC
CACAGAAAG
C A A A CCGCCCUCGAGGCCGA AA
GGCUUC GCGGAGGGU AAGG G U G A AA
GG G U G C GG
U AAGAGCGCACC
GC G U C U U U G G
CAACAAA
GGCGGCAAGGCAAAC
CCC A C C G G G A C C
AAAU AG
G G G C U GCACG
GACGAGAG A
UCGUCCAG G U C U G
UUUCCAGACCCGCGGCCC
GGGUUGGUU
GCAAGAG G C G U C U C G C
AAGAGGCGUCCCAGAUGAAUGGCCAUC A C
CUCGCAG C AA
UGCGAGG A a c
agaacccggcuua
1
20
40
60
80
100
120
140
160180
200
220
240260
280
300
320
340
A ACG A G1
2
3 4
5
6
7
8
9
10
11
12
1
2
3 4
5
6
7
8
9
top related