improving the prediction of rna secondary structure by detecting and assessing conserved stems
DESCRIPTION
Improving the prediction of RNA secondary structure by detecting and assessing conserved stems. Xiaoyong Fang, et al. Outline. Background Related work Methods Results Discussion. Background. - PowerPoint PPT PresentationTRANSCRIPT
Improving the prediction of RNA secondary structure by detecting and assessing conserved stems
Xiaoyong Fang et al
Outline
Background Related work Methods Results Discussion
Background Non-coding RNAs gained increasing interest
since a huge variety of functions associated with them were found
The function of an RNA molecule is principally determined by its (secondary) structure
Unfortunately the current physical methods available for structure determination are time-consuming and expensive
cont The prediction of RNA secondary structure can
be facilitated by incorporating with comparative analysis of homologous sequences
However most of existing comparative methods are vulnerable to alignment errors and thus are of low accuracy in practical application
And the methods for simultaneous sequence alignment and structure prediction are too computationally taxing to be practical
Related work Zuker et al Mfold 1981 2003 Knudsen B and J Hein Pfold 1999 2003 Hofacker I M Fekete and P Stadler
RNAalifold 2002 Ruan J et al 2004 Knight R et al 2004 Weinberg Z and WL Ruzzo 2004
cont Sankoff D 1985 Hofacker I S Bernhart and P Stadler 2004 Mathews D and D Turner 2002 Mathews D 2005 helliphellipetc
Method 1) We detect possible stems in single RNA sequence
using the so-called position matrix with which some possibly paired positions can be uncovered
2) We detect conserved stems across multiple RNA sequences by multiplying the position matrices
3) We assess the conserved stems using the Signal-to-Noise
4) We compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program
Given an RNA sequence of length N Seq we build one NtimesN position matrix (denoted by MSeq) by
1) The reverse complement of Seq Seqrsquo is firstly computed from the original sequence
2) We build one NtimesN matrix containing Seqrsquo in the first row The i-th (0 le i le N-1) row contains the sequence generated from Seqrsquo by shifting i position to the left (circular left shift)
Detect possible stems in single RNA sequence
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Outline
Background Related work Methods Results Discussion
Background Non-coding RNAs gained increasing interest
since a huge variety of functions associated with them were found
The function of an RNA molecule is principally determined by its (secondary) structure
Unfortunately the current physical methods available for structure determination are time-consuming and expensive
cont The prediction of RNA secondary structure can
be facilitated by incorporating with comparative analysis of homologous sequences
However most of existing comparative methods are vulnerable to alignment errors and thus are of low accuracy in practical application
And the methods for simultaneous sequence alignment and structure prediction are too computationally taxing to be practical
Related work Zuker et al Mfold 1981 2003 Knudsen B and J Hein Pfold 1999 2003 Hofacker I M Fekete and P Stadler
RNAalifold 2002 Ruan J et al 2004 Knight R et al 2004 Weinberg Z and WL Ruzzo 2004
cont Sankoff D 1985 Hofacker I S Bernhart and P Stadler 2004 Mathews D and D Turner 2002 Mathews D 2005 helliphellipetc
Method 1) We detect possible stems in single RNA sequence
using the so-called position matrix with which some possibly paired positions can be uncovered
2) We detect conserved stems across multiple RNA sequences by multiplying the position matrices
3) We assess the conserved stems using the Signal-to-Noise
4) We compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program
Given an RNA sequence of length N Seq we build one NtimesN position matrix (denoted by MSeq) by
1) The reverse complement of Seq Seqrsquo is firstly computed from the original sequence
2) We build one NtimesN matrix containing Seqrsquo in the first row The i-th (0 le i le N-1) row contains the sequence generated from Seqrsquo by shifting i position to the left (circular left shift)
Detect possible stems in single RNA sequence
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Background Non-coding RNAs gained increasing interest
since a huge variety of functions associated with them were found
The function of an RNA molecule is principally determined by its (secondary) structure
Unfortunately the current physical methods available for structure determination are time-consuming and expensive
cont The prediction of RNA secondary structure can
be facilitated by incorporating with comparative analysis of homologous sequences
However most of existing comparative methods are vulnerable to alignment errors and thus are of low accuracy in practical application
And the methods for simultaneous sequence alignment and structure prediction are too computationally taxing to be practical
Related work Zuker et al Mfold 1981 2003 Knudsen B and J Hein Pfold 1999 2003 Hofacker I M Fekete and P Stadler
RNAalifold 2002 Ruan J et al 2004 Knight R et al 2004 Weinberg Z and WL Ruzzo 2004
cont Sankoff D 1985 Hofacker I S Bernhart and P Stadler 2004 Mathews D and D Turner 2002 Mathews D 2005 helliphellipetc
Method 1) We detect possible stems in single RNA sequence
using the so-called position matrix with which some possibly paired positions can be uncovered
2) We detect conserved stems across multiple RNA sequences by multiplying the position matrices
3) We assess the conserved stems using the Signal-to-Noise
4) We compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program
Given an RNA sequence of length N Seq we build one NtimesN position matrix (denoted by MSeq) by
1) The reverse complement of Seq Seqrsquo is firstly computed from the original sequence
2) We build one NtimesN matrix containing Seqrsquo in the first row The i-th (0 le i le N-1) row contains the sequence generated from Seqrsquo by shifting i position to the left (circular left shift)
Detect possible stems in single RNA sequence
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
cont The prediction of RNA secondary structure can
be facilitated by incorporating with comparative analysis of homologous sequences
However most of existing comparative methods are vulnerable to alignment errors and thus are of low accuracy in practical application
And the methods for simultaneous sequence alignment and structure prediction are too computationally taxing to be practical
Related work Zuker et al Mfold 1981 2003 Knudsen B and J Hein Pfold 1999 2003 Hofacker I M Fekete and P Stadler
RNAalifold 2002 Ruan J et al 2004 Knight R et al 2004 Weinberg Z and WL Ruzzo 2004
cont Sankoff D 1985 Hofacker I S Bernhart and P Stadler 2004 Mathews D and D Turner 2002 Mathews D 2005 helliphellipetc
Method 1) We detect possible stems in single RNA sequence
using the so-called position matrix with which some possibly paired positions can be uncovered
2) We detect conserved stems across multiple RNA sequences by multiplying the position matrices
3) We assess the conserved stems using the Signal-to-Noise
4) We compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program
Given an RNA sequence of length N Seq we build one NtimesN position matrix (denoted by MSeq) by
1) The reverse complement of Seq Seqrsquo is firstly computed from the original sequence
2) We build one NtimesN matrix containing Seqrsquo in the first row The i-th (0 le i le N-1) row contains the sequence generated from Seqrsquo by shifting i position to the left (circular left shift)
Detect possible stems in single RNA sequence
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Related work Zuker et al Mfold 1981 2003 Knudsen B and J Hein Pfold 1999 2003 Hofacker I M Fekete and P Stadler
RNAalifold 2002 Ruan J et al 2004 Knight R et al 2004 Weinberg Z and WL Ruzzo 2004
cont Sankoff D 1985 Hofacker I S Bernhart and P Stadler 2004 Mathews D and D Turner 2002 Mathews D 2005 helliphellipetc
Method 1) We detect possible stems in single RNA sequence
using the so-called position matrix with which some possibly paired positions can be uncovered
2) We detect conserved stems across multiple RNA sequences by multiplying the position matrices
3) We assess the conserved stems using the Signal-to-Noise
4) We compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program
Given an RNA sequence of length N Seq we build one NtimesN position matrix (denoted by MSeq) by
1) The reverse complement of Seq Seqrsquo is firstly computed from the original sequence
2) We build one NtimesN matrix containing Seqrsquo in the first row The i-th (0 le i le N-1) row contains the sequence generated from Seqrsquo by shifting i position to the left (circular left shift)
Detect possible stems in single RNA sequence
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
cont Sankoff D 1985 Hofacker I S Bernhart and P Stadler 2004 Mathews D and D Turner 2002 Mathews D 2005 helliphellipetc
Method 1) We detect possible stems in single RNA sequence
using the so-called position matrix with which some possibly paired positions can be uncovered
2) We detect conserved stems across multiple RNA sequences by multiplying the position matrices
3) We assess the conserved stems using the Signal-to-Noise
4) We compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program
Given an RNA sequence of length N Seq we build one NtimesN position matrix (denoted by MSeq) by
1) The reverse complement of Seq Seqrsquo is firstly computed from the original sequence
2) We build one NtimesN matrix containing Seqrsquo in the first row The i-th (0 le i le N-1) row contains the sequence generated from Seqrsquo by shifting i position to the left (circular left shift)
Detect possible stems in single RNA sequence
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Method 1) We detect possible stems in single RNA sequence
using the so-called position matrix with which some possibly paired positions can be uncovered
2) We detect conserved stems across multiple RNA sequences by multiplying the position matrices
3) We assess the conserved stems using the Signal-to-Noise
4) We compute the optimized secondary structure by incorporating the so-called reliable conserved stems with predictions by RNAalifold program
Given an RNA sequence of length N Seq we build one NtimesN position matrix (denoted by MSeq) by
1) The reverse complement of Seq Seqrsquo is firstly computed from the original sequence
2) We build one NtimesN matrix containing Seqrsquo in the first row The i-th (0 le i le N-1) row contains the sequence generated from Seqrsquo by shifting i position to the left (circular left shift)
Detect possible stems in single RNA sequence
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Given an RNA sequence of length N Seq we build one NtimesN position matrix (denoted by MSeq) by
1) The reverse complement of Seq Seqrsquo is firstly computed from the original sequence
2) We build one NtimesN matrix containing Seqrsquo in the first row The i-th (0 le i le N-1) row contains the sequence generated from Seqrsquo by shifting i position to the left (circular left shift)
Detect possible stems in single RNA sequence
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
3) The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row 0 or 1 or -1 is assigned to the i j (0 le j le N-1) element of MSeq by comparing the j-th character of Seq with the i j (0 le j le N-1) element of the matrix of 2)
The following rules should be obeyed when two characters (b and brsquo) are compared with each other
1048698 If b equals brsquo and neither of them is lsquo_rsquo then 1 is assigned 1048698 If b or brsquo is lsquo_rsquo or both of them are lsquo_rsquo then -1 is assigned 1048698 If b does not equal to brsquo and neither of them is lsquo_rsquo then 0 is a
ssigned
Cont 1
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
We can detect all possible stems in an RNA sequence by scanning the position matrix row by row The key is to find all zones of continuous ldquo1rdquo in the matrix There is a one-to-one mapping between the stems in the sequence and the zones of continuous ldquo1rdquo in the position matrix
Cont 2
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Example
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
The multiplying of the position
matrices M [i j] = M1 [i j] times M2 [i j]
0times0 = 0 0times1 = 0 0times (-1) = 0 1times1 = 1 1times (-1) = 1 (-1)times (-1) = -1
M =
Detecting conserved stems across multiple sequences
n
iiM
1
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
We detect conserved stems shared by all sequences in an alignment by
1) We extract all sequences from the alignment and detect all possible stems in each sequence using the approach described above
2) We select n different stems from n sequences (one stem for one sequence) Here n is the number of sequences in the alignment
3) We build the position matrix for each selected stem using the approach described above and multiply these n matrices according to the rules mentioned above
Cont 1
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
4) We detect conserved stems by finding zones of continuous lsquo1rsquo in the resulting matrix (M) generated by 3) There is still a one-to-one mapping between the conserved stems and the zones of continuous lsquo1rsquo in M
5) We repeat the steps 2) to 4) until all the stems detected by 1) are selected
Cont 2
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Note Actually the multiplying of the position matrices can be
directly used to detect conserved stems in the RNA alignment But here we first extract all sequences from the RNA alignment and then detect conserved stems shared by all sequences The benefit for this is that some conserved stems possibly missed due to alignment errors also can be detected
Cont 3
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
ExampleFigure 2
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
The main steps are 1) We record the number of column pairs (see Figure 3) in
the conserved stem as the Signal
2) We generate a randomized alignment from the original alignment of the conserved stem
3) We detect possibly conserved stems in the new alignment using the approach mentioned above
Assessing the conserved stems using the Signal-to-Noise
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
4) We record the number of column pairs in the conserved stem newly detected by 3) as the Noise The Signal-to-Noise is thus computed by Signal Noise Note that we set Noise to be 1 if there are no conserved stems in the randomized alignment and we set Noise to be the maximum if there are more than one conserved stems
Cont
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
ExampleFigure 3
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
The main steps are 1) We compute a candidate secondary structure using
RNAalifold which is a popular and powerful program for predicting RNA secondary structure at present
2) We extract all sequences from the RNA alignment and detect all possibly conserved stems shared by them
3) We compute the Signal-to-Noise for each conserved stem
4) We remove incompatible conserved stems and determine the reliable conserved stems according to the Signal-to-Noise
Computing the optimized secondary structure
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Here reliable conserved stems refer to following two kind of conserved stems
The conserved stems with very high Signal-to-Noise They are thought to belong to a real RNA secondary structure in our method
The conserved stems with very low Signal-to-Noise They are not thought to belong to any real RNA secondary structure in our method
Cont 1
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
5) We add the base pairs which are included by the first kind of reliable conserved stems but not included by the prediction by RNAalifold into the candidate secondary structure
6) We remove the base pairs which are included by both the second kind of reliable conserved stems and the prediction by RNAalifold from the candidate secondary structure
7) We compute the final secondary structure by merging the results of 5) and 6)
Cont 2
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Results We construct one data set of three-sequence alignments one
data set of four-sequence alignments and one data set of five-sequence alignments on the basis of Rfam 80
We compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives and the specificity as the number of true positives divided by the sum of true positives + false positives
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Cont 1
Table 1 - Sensitivity and specificity on data set of three-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7598 7886 7497 7784
50-60 5029 5806 4859 5130
60-70 7317 6989 7279 6552
70-80 5667 4988 5587 4434
80-90 6796 6616 6782 6141
90-100 6256 5461 5944 5180
Total 6444 6291 6325 5870
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Cont 2
Table 2 - Sensitivity and specificity on data set of four-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7516 8996 7460 8684
50-60 4865 5043 4348 4448
60-70 7488 5921 7456 5664
70-80 6501 5011 6190 4151
80-90 7318 6711 7151 6410
90-100 5028 5103 4870 4122
Total 6453 6312 6246 5580
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Cont 3
Table 3 - Sensitivity and specificity on data set of five-sequence alignmentsId percentage identity Se sensitivity Sp specificity
Id () Se () Sp () SeRNAalifold () SpRNAalifold ()
lt50 7256 8988 7253 8893
50-60 5101 5673 4920 4927
60-70 7691 7315 7634 7218
70-80 6499 4893 6209 3964
80-90 7176 6698 7171 6674
90-100 5135 5101 4732 4205
Total 6476 6445 6315 5980
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Discussion In this paper we present a stem-based
method to improve the prediction of RNA secondary structure The central idea of our method is to detect and assess conserved stems shared by all sequences in the input RNA alignment and then to compute the optimized secondary structure by incorporating reliable conserved stems with predictions by RNAalifold
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Cont 1 Our method improves RNAalifold by two major
means 1) to add some conserved stems which are possibly missed by RNAalifold for improving the sensitivity 2) to remove some conserved stems which are possibly mistaken by RNAalifold for improving the specificity
We tested our method on data sets of RNA
alignments taken from the Rfam database Experimental results suggest that our method can predict RNA secondary structure with much better performance than RNAalifold
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Cont 2 In future work our method can be improved by
1) To compute the Signal-to-Noise in more effective ways
2) To determine the reliable conserved stems by more intelligent approaches
3) To devise new parallel algorithms to speed up the method
4) Further benchmark for evaluating the performance of the method
Thank you very much
Please contact xyfangnudteducn for any
questions
Thank you very much
Please contact xyfangnudteducn for any
questions