undergrad research paper
TRANSCRIPT
A phylogenetic study of small RNA Tpke11 using Secondary
Structures
Abstract
Small RNA (sRNA) has recently been a focus of research when once the noncoding
regions of genomes were tought to be useless. These molecules are found in all domains and in
many different types of organisms. sRNAs have many known functions including; gene
regulation, catabolic, and cell structure. However, many of cell functions and characteristics of
sRNAs are unknown. The research done in this study used bioinformatics and award-winning
phylogenetic methods to not only understand the evolutionary history of the molecule family
tpke11’s primary structure but also its secondary structures. Secondary structures relate to
function of the molecule and can reveal more of an understanding of this RNA family. The
methods applied in this study combine both geometrical and statistical data to generate trees of
molecules to reveal the phylogenetic history the domains Eukaryote and Bacteria. This study is
important for expanding our understanding of this unique gene family, and also demonstrates the
application of the phylogenetic analysis of RNA structural evidence in support of the existing
tree of life.
Background
Small RNA (sRNA) have many functions; metabolic, regulatory, catabolic. Small RNA is
found in many different organisms and in all domains. However, many of these non-coding RNA
molecules are uncharacterized, and their functions are unknown. The molecule tpke11 is a sRNA
family with little research conducted. This particular molecule was found theoretically, and then
found experimentally first in E. coli and a select species of yeast using microarray and Northern
blotting techniques. However, many of these species sequences are still only found theoretically,
and still need to be confirmed in the laboratory. The tpke11 family is involved in gene
regulation. This sRNA binds to the chaperone protein Hfq and regulates the translation of sigma
38 (Griffiths-Jones & Bateman, 439-441). Without this sigma factor RNA polymerase cannot
bind to the promoter site and transcription cannot occur (Hershberg, 1813-1820). In recent
studies it was found that sRNA in Erwinia amylovora were responsible for the apple and pear
fire blight disease. The production of Hfq leads to the binding of RNA polymerase which leads
to the synthesis of proteins that made this bacteria pathogenic (Zeng & Sundin, 414).
The research done in the study, “A phylogenetic study of small RNA Tpke11 using
Secondary Structures”, was based on the research done by Caetano-Anolle. His research was the
first to analyze phylogenetic evolution of RNA features as a whole molecule. His work is based
off of the assumption that ancestral features are the most stable. The research using shape and
function shows how mutations can change a within molecule, and from species to species. His
work also not only shows that features morph, but also puts an emphasis on the time line of when
the mutations occurs, which is often overlooked (Pollock, 375-376). Caetano-Anolle’s research
and contributions earned him the Zuckerkandl Prize in 2002 for his paper; “Evolved RNA
Secondary Structure and the Rooting of the Universal Tree of Life” (The Zuckerkandl Prize,
375).
Working off the research done in the paper “Evolved RNA Secondary Structure and the
Rooting of the Universal Tree of Life” and with the aid of Caetano-Anolle, Feng-Jie Sun looked
at the molecular evolution of 5S rRNA. This molecule is found in all domains and is a common
molecule in research of RNA structure. Using the methods developed by Caetano-Anolle,
Fengjie Sun created a secondary structure phylogenetic tree with a total of 46 features. This work
showed that the domain Archaea developed first and the domains Bacteria and Eukaryote were
monophyletic and derived. This showed that there was a link between 5S rRNA helices and the
age of ribosomal proteins. Looking at 5S rRNA’s function and structure along with its primary
structure exhibited that 5S rRNA had a late evolutionary development, just before the clan
separation of the Archaea domain (Fengjie & Caetano-Anollés, 430).
The study done in “A phylogenetic study of small RNA Tpke11 using secondary
structures objectives” was to see how the award winning methods could be applied to tpek11 and
if these methods could be replicated. Also, if the domains of the molecule could be maintained
like predicted by Caetano-Anolle and Fengjie Sun, and if any evolutionary knowledge can be
gained by looking not only at the primary evolution of a molecule, but also the secondary
structures.
Methods
First, the databank Rfam was used to find a RNA molecule. The search on Rfam was
small RNA specific. Tpke11 was selected because it had more than one domain. Then from the
570 species sequences, 30 sequences were hand selected and saved in a text file. This text file
was then turned into a sequence matrix with the species accession number if available (Table1).
Next, these sequences were converted into FASTA format, and using Molecular Evolutionary
Genetics Analysis (MEGA6) these sequences were aligned with the ClustalX program. Once
aligned the species’ phylogenetic tree was created with the primary structure of the species
(Phylogenetic Tree1,2,&3). Three trees were created; neighbor joining, maximum parsimony,
and maximum likelihood. After the primary structure trees were created, then ViennaRNA Web
services was accessed. With the RNAfold server on default settings the secondary structures
were created (Figure 2). After all 30 sequences had their theoretical secondary structures, the
MFE structure was used to name feature characters. The features consisted of; stems, internal
loops, terminating loops, and bulges (Figure 1). Starting at the 3 prime end and finishing with the
5 prime end of the structures, each nucleotide was counted in each feature, and recorded in an
Excel spreadsheet. Bulges took the most priory when determining the features, where stems took
the least. There were a total of 28 features. The spreadsheet was then turned into a features
matrix (Matrix1). This matrix contained the numbers 1-9 for the nucleotides, and for the number
of nucleotides with 10 or more in the feature the alphabetical letter equivalent was used. If the
feature was not found in that particular species structure it was represented by a zero. This
sequence matrix was then given to Fengjie Sun, and a secondary structure phylogenetic tree was
created (Phylogenetic Tree4).
Figure 1
Using the species sequences (Table1) trees, secondary structures, and matrices were created.
Data
The data collected in this research consists of the 30 species sequences and accession number
(Table1).
Species Name and Accession Number
Species Sequence>Oryza_sativa AAK92623 CCCGGAAACCAGCACGGGCGUCGAGGCAACUCUGCGCCCGUGCACGCAU
GUUAAGGGUAAGCGAAAGAAU>Nasonia_vitripennis XP_001606835
AACAGUUACUGAUACCGAAACACGGGCGUCGAGGCAACUCUACGCCCGUGUGCUUAUGUCAAGGGUAAUCCAGAAUGG
>Cucumbis_ sativus BAB19275 UGCAGGGUAAUUAAUCGGCACGGGCGUAGGAGUAUUCUCCACGCCCGUGCUCGCAUGUUAAGGGGCUUAAAAAAACC
>E_coli TW10598 AACGGGUAAUUAUACUGACACGGGCGAAGGGGAAUUUCCUCUCCGCCCGUGCAUUCAUCUAGGGGCAAUUUAAAAAAG
>Pantoea_ananatis D90087 GACAGCCGUUGAAACCAGCACGGGCGUAGGAGUUUUCUCCACGCCCGUGCGAUCAUGUUAAGGGCUAAAUAAAUGGC
>Pectobacterium_wasabiae WPP163
UAUUGCUGGCAAUCAGCACGGGCGUCGAGGAAACUCUGCGCCCGUGCACGCAUGUUGAGGGGCAGGAAAAGAAAU
>Proteus_panneri AF324468 AAUAGUUACUGAUACCGAACACGGGCGUCGAGGAAACUCAACGCCCGUGUGCUUAUGUCAAGGGUAAUCUAGAAUGG
>Salmonella_bongori FR877557.1
AACGGGUAAUUACUGGCACGGGCGAAGAGGUUUCCUCUCCGCCCGUGCAUGCAUGUUAAGGGCAGAUAAAAAAAG
>Baphidicold_Str UAUAGACUUUAAACUAGCACGGGCGUAGAUAAACuUCUGCGCCCGUGUAAUUAUUUUUAAAUCAGGUAGAGUAAG
>Klebsiella_pneumoniae CP000964
AACGGGUAAAACACUGGCACGGGCGAAGAGGUUUcCUCUCCGCCCGUGCACGCAUGUUAAGGGGCUGAAAAACAC
>Escherichia_fergusonii AEVY01000001.1
AACGGGUAAUUAUACUGACACGGGCGAAGAGGaaUUUcCUCUCCGCCCGUGCAUUCAUCUAGGGGCAAAUCAAAAAAG
>Enterobacter_sp 638 UGCAGGGUAAUUAAUCGGCACGGGCGAAGAGGUUUcCUCUCCGCCCGUGCUCGCAUGUUAAGGGGCUGAAAAAACCA
>Cronobacter_sakazakii BAA-894
CAGGGUAGAUAACACUGGCACGGGCGUAGGAGUUUcUCCACGCCCGUGCACGCAUGUCAGGGGCAGAUAAAACAUG
>Erwinia_tasmaniensis AAUUAACCAGCACGGGCGUAGGAUUUUaUCCACGCCCGUGCACGCAUGUUAAGGGCAGGAUCAAUAAC
>Photorhabdus_luminescens BX571860.1
AACUGUUGcuGAUUAACCGGCACGGGCGUUGAGGGAACUCUGCGCCCGUGCGCGCAUGUUAAGGGUAAAAUAAGAGAU
>Sodalis_glossinidius AACAGGCACGGGCGUCGGGGAAAcUCUGCGCCCGUGCAAGCAUGUUGAGGGCAGGAAGAGACAA
>Pectobacterium_carotovorum PBR1692
AAUCAGCACGGGCGUCGAGGAAACUCUACGCCCGUGCACGCAUGUUGAGGGGCAGGAAAAGAAA
>Pectobacterium_atrosepticum SCRI1043
AAUCAGCACGGGCGUCGAGGAAACUCUGCGCCCGUGCACGCAUGUUgAAGGGCAGGAAAAGAAAU
>Klebsiella_pneumoniae 342 AACGGGUAAAACACUGGCACGGGCGAAGAGGUUUcCUCUCCGCCCGUGCACGCAUGUUAAGGGGCUGAAAAACAC
>Citrobacter_koseri CP000822.1
AACGGGUAAAAUACUGGCACGGGCGAAGAGGGUUcCUCUCCGCCCGUGCACGCAUGUUAGGGGCAGAUAAAAACAA
>Salmonella_enterica AACGGGUAAUUACUGGCACGGGCGAAGAGGUUUcCUCUCCGCCCGUGCAUGCAUGUUAAGGGCAGAUAAAAAGAG
>Escherichia_albertii TWO 7627
AACGGGUAAUUAUACUGACACGGGCGAAGGGGauUUUcCUCUCCGCCCGUGCAUUCAUCCAGGGGCAAAUAAAAAGAG
>Providencia_stuartii 25827 AAUGGUUACUGAUACCgAACACGGGCGUCGAGGAAACUCCACGCCCGUGUCGGCAUGUUAAGGGUAAAAUAAAUGGC
>Proteus_penneri 35198 AAUAGUUACUgAUACCGAACACGGGCGUCGAGGAAACUCAACGCCCGUGUGCUUAUGUCAAGGGUAAUCUAGAAUGG
>Proteus_mirabilis AACAGUUACUGAUACCGaaACACGGGCGUCGAGGCAACUCUACGCCCGUGUGCUUAUGUCAAGGGUAAUCCAGAAUGG
>Buchnera_aphidicola AUUAGAGUAUuUAAACCAGCACGGGCGUAGAAAAAAUCUACGCCCGUGUACUUAUUUUGAAAGCAGGAAGAAUAGC
>Providencia_alcalifaciens DSM 30120
GACUGAUACCGGACACGGGCGUUGAGGGAACUCUGCGCCCGUGUCUGCAUGUUAAGGGAAAAAUAAAUGGC
>Serratia_proteamaculans 568 UACCGCAAACCAGCACGGGCGUCGAGGCAACUCUACGCCCGUGCACGCAU
GUUAAGGGUUACAGAAAUAAU>Yersinia_pseudotuberculosis IP 31758 CP000720.1
UACUGUUGCUgGAAACCAGCACGGGCGUCGAGGAAACUCUACGCCCGUGCACGCAUGUUAAGGGUAGGAAAAAAGAG
>Citrobacter_youngae 29220 UGCAGGGUAGUUAAUCGGCACGGGCGUAGAGGUUUcCUCUUCGCCCGUGCUUGUAUGUUAAGGGGCUUAAAAAAAC
Table 1
Table 1 shows the species name and its accession number. The sequence was found on Rfam and there are a total of 30 sequences.
Results
Phylogenetic Tree1
Phylogenetic Tree1 is a Neighbor Joining tree that is created with the distance between pairs of taxa.
Phylogenetic Tree2
Phylogenetic Tree2 is a Maximum Likelihood tree was created based on the probable evolutionary history.
Phylogenetic Tree3
Phylogenetic Tree3 is a Maximum Parsimony tree based off of the least amount of mutations.
These phylogenetic trees show that the eukaryotic domain was not maintained. The four eukaryotic species are
spread throughout the tree, and none of the species have a rooted ancestor. The species Cucumbus sativius is rooted
the farest from the other four domains in each tree created.
Figure 2
Figure1 is an example of the secondary structure of Oryza sativa generated by Vienna RNA Web services.
Species Name Matric Features
Oryza_sativa 03000020247A001130000400000D
Nasonia_vitripennis 4A030000055A0011300004000009
Cucumbis_Sativus 1200000004440000G5411400000B
A_nasoniea 550000000464100034400400000A
E_coli 230000000008001160000300000R
Pantoea_ananatis 34001100457E010000000400000B
Pectobacterium_wasabiae 29020000045A001130000400000A
Proteus_panneri 2C030000045A0011300004000006
Salmonella_bongori 130000000009001150000313900A
Klebsiella_pneumoniae 1300000004640012B00013000008
Baphidicold_StrG D0000000000000000000627A001
Escherichia_fergusonii 230000000008001160000300000R
Enterobacter_sp 23000000054G2011500003000009
Cronobacter_sakazakii DE1000003000000000000354300B
Erwinia_tasmaniensis 7E1000000000000000000300000R
Photorhabdus_luminescens 45002111H003000000000400000A
Sodalis_glossinidius 5A0011000000000000000454D001
Pectobacterium_carotovorum 5A0011003000000000000454400A
Pectobacterium_atrosepticum 5A0011003000000000000454400B
Citrobacter_koseri 2300000034640012B00013000008
Salmonella_enterica 130000003009001150000313900A
Escherichia_albertii 230000000008001160000300000R
Providencia_stuartii 5m3211000000000000000400000A
Proteus_penneri 2C000200045A0011300004000006
Proteus_mirabilis 4A010000055A0011300004000009
Buchnera_aphidicola 12000000093D000000000500000G
Providencia_alcalifaciens 35000000005D001130000400000H
Serratia_proteamaculans 16020000050A001130000400030C
Yersinia_pseudotuberculosis 040011003009000090000400009B
Citrobacter_youngae 5300001050340011900000000309
Matrix1
Matrix1 consists of the secondary features for each species.
Phylogenetic Tree4
Phylogenetic Tree4 was made from the matrix secondary features (Matrix1).
This secondary tree shows the four eukaryotic species much closer together than the primary tree. However, the domains, even though closer, were not maintained.
Discussions
The primary phylogenetic structure trees show that the two domains were not maintained
throughout the trees. The four eukaryotic species were spread throughout the trees with
Cucumbis sativus be the most apart from the other species. The prediction on the domains, based
on the work done by Fengjie Sun, were the prokaryotic species would be rooted in a separate
taxa that the four eukaryotic. The different rooted taxas would show that these domains evolved
at different times. However, the trees created in this research did not show the different
evolutionary distinctions, so there was experimental error. When more research was done on the
molecule tpke11, it was found that when using the microarray technique many of the sequences
found were false positive (Griffiths-Jones & Bateman, 439-441). This means that the microarray
showed a gene for translation that might not actually be present. More research needs to be
looked into each sequence, and only species that have this gene should be used for phylogenetic
analysis.
The secondary structure tree show the four eukaryotic species clustered closer together.
However, there is still no distinction between the two domains. Secondary structures and shape
relate to the function of the molecule. The species could be closer together on the tree because
their functions in eukaryotic species are different than the functions of the tpke11 molecules in
prokaryotic species. The molecule tpke11 could be for regulation in both domains, but this
molecule could regulate different gene expression for each taxas. This shows that the eukaryotic
evolved at a different time than the prokaryotic species which correlates with the research done
by Fengjie Sun and Caetano-Anollés.
Conclusion
In this research study the phylogenetic evolution of tpke11 was analyzed using primary
and secondary structures. The phylogenetic trees were created using award-winning methods,
and primary and secondary trees were created. The primary trees showed that the prokaryotic
and eukaryotic species did not develop independently. However, the secondary structure tree
showed that the eukaryotic species could have evolved much more independently from the
prokaryotic species. This has to do with the shape of the molecule. The shape relates to function,
and showed that the eukaryotic species evolved differently that most prokaryotic species
analyzed in this research. The process in this study was reputable with the tpke11 using methods
in “Evolved RNA Secondary Structure and the Rooting of the Universal Tree of Life”. Future
work that needs to be done. The species need to be analyzed by each individual sequence to see
if there is any mistakes that correlate to the skewed primary trees. Only species that are
confirmed with multiple laboratory techniques should be analyzed. A molecule with all domains
would be favorable to see how all three domains are connected to their phylogenetic structures.
Lastly, more than 30 species will need to be analyzed so that a wider span of nucleotide can be
studied and the domains notated.
References
Griffiths-Jones, S, & Bateman, Rfam Database website, A Nucl. Acids Res. 31 (1), p. 439-441, September 2002.
Fengjie, S, & Caetano-Anollés, G 2009, 'The Evolutionary History of the Structure of 5S Ribosomal RNA', Journal Of Molecular Evolution, 69, 5, p. 430, Advanced Placement Source, EBSCOhost, viewed 29 November 2014.
Hershberg, Ruth, Shoshy Altuvia, and Hanah Margalit. "A Survey Of Small RNA-Encoding Genes In Escherichia Coli." Nucleic Acids Research 31.7 (2003): 1813-1820. MEDLINE with Full Text. Web. 20 Nov. 2014.
Pollock, DD 2003, 'The Zuckerkandl Prize: Structure and Evolution', Journal Of Molecular Evolution, 56, 4, pp. 375-376, Academic Search Complete, EBSCOhost, viewed 29 November 2014.
Zeng, Q, & Sundin, G 2014, 'Genome-wide identification of Hfq-regulated small RNAs in the fire blight pathogen Erwinia amylovora discovered small RNAs with virulence regulatory function', BMC Genomics, 15, p. 414, MEDLINE with Full Text, EBSCOhost, viewed 29 November 2014.
The Zuckerkandl Prize: Structure and Evolution; Journal of Molecular Evolution;Apr2003, Vol. 56 Issue 4, p375