undergrad research paper

A phylogenetic study of small RNA Tpke11 using Secondary

Structures

Abstract

Small RNA (sRNA) has recently been a focus of research when once the noncoding

regions of genomes were tought to be useless. These molecules are found in all domains and in

many different types of organisms. sRNAs have many known functions including; gene

regulation, catabolic, and cell structure. However, many of cell functions and characteristics of

sRNAs are unknown. The research done in this study used bioinformatics and award-winning

phylogenetic methods to not only understand the evolutionary history of the molecule family

tpke11’s primary structure but also its secondary structures. Secondary structures relate to

function of the molecule and can reveal more of an understanding of this RNA family. The

methods applied in this study combine both geometrical and statistical data to generate trees of

molecules to reveal the phylogenetic history the domains Eukaryote and Bacteria. This study is

important for expanding our understanding of this unique gene family, and also demonstrates the

application of the phylogenetic analysis of RNA structural evidence in support of the existing

tree of life.

Background

Small RNA (sRNA) have many functions; metabolic, regulatory, catabolic. Small RNA is

found in many different organisms and in all domains. However, many of these non-coding RNA

molecules are uncharacterized, and their functions are unknown. The molecule tpke11 is a sRNA

family with little research conducted. This particular molecule was found theoretically, and then

found experimentally first in E. coli and a select species of yeast using microarray and Northern

blotting techniques. However, many of these species sequences are still only found theoretically,

and still need to be confirmed in the laboratory. The tpke11 family is involved in gene

regulation. This sRNA binds to the chaperone protein Hfq and regulates the translation of sigma

38 (Griffiths-Jones & Bateman, 439-441). Without this sigma factor RNA polymerase cannot

bind to the promoter site and transcription cannot occur (Hershberg, 1813-1820). In recent

studies it was found that sRNA in Erwinia amylovora were responsible for the apple and pear

fire blight disease. The production of Hfq leads to the binding of RNA polymerase which leads

to the synthesis of proteins that made this bacteria pathogenic (Zeng & Sundin, 414).

The research done in the study, “A phylogenetic study of small RNA Tpke11 using

Secondary Structures”, was based on the research done by Caetano-Anolle. His research was the

first to analyze phylogenetic evolution of RNA features as a whole molecule. His work is based

off of the assumption that ancestral features are the most stable. The research using shape and

function shows how mutations can change a within molecule, and from species to species. His

work also not only shows that features morph, but also puts an emphasis on the time line of when

the mutations occurs, which is often overlooked (Pollock, 375-376). Caetano-Anolle’s research

and contributions earned him the Zuckerkandl Prize in 2002 for his paper; “Evolved RNA

Secondary Structure and the Rooting of the Universal Tree of Life” (The Zuckerkandl Prize,

375).

Working off the research done in the paper “Evolved RNA Secondary Structure and the

Rooting of the Universal Tree of Life” and with the aid of Caetano-Anolle, Feng-Jie Sun looked

at the molecular evolution of 5S rRNA. This molecule is found in all domains and is a common

molecule in research of RNA structure. Using the methods developed by Caetano-Anolle,

Fengjie Sun created a secondary structure phylogenetic tree with a total of 46 features. This work

showed that the domain Archaea developed first and the domains Bacteria and Eukaryote were

monophyletic and derived. This showed that there was a link between 5S rRNA helices and the

age of ribosomal proteins. Looking at 5S rRNA’s function and structure along with its primary

structure exhibited that 5S rRNA had a late evolutionary development, just before the clan

separation of the Archaea domain (Fengjie & Caetano-Anollés, 430).

The study done in “A phylogenetic study of small RNA Tpke11 using secondary

structures objectives” was to see how the award winning methods could be applied to tpek11 and

if these methods could be replicated. Also, if the domains of the molecule could be maintained

like predicted by Caetano-Anolle and Fengjie Sun, and if any evolutionary knowledge can be

gained by looking not only at the primary evolution of a molecule, but also the secondary

structures.

Methods

First, the databank Rfam was used to find a RNA molecule. The search on Rfam was

small RNA specific. Tpke11 was selected because it had more than one domain. Then from the

570 species sequences, 30 sequences were hand selected and saved in a text file. This text file

was then turned into a sequence matrix with the species accession number if available (Table1).

Next, these sequences were converted into FASTA format, and using Molecular Evolutionary

Genetics Analysis (MEGA6) these sequences were aligned with the ClustalX program. Once

aligned the species’ phylogenetic tree was created with the primary structure of the species

(Phylogenetic Tree1,2,&3). Three trees were created; neighbor joining, maximum parsimony,

and maximum likelihood. After the primary structure trees were created, then ViennaRNA Web

services was accessed. With the RNAfold server on default settings the secondary structures

were created (Figure 2). After all 30 sequences had their theoretical secondary structures, the

MFE structure was used to name feature characters. The features consisted of; stems, internal

loops, terminating loops, and bulges (Figure 1). Starting at the 3 prime end and finishing with the

5 prime end of the structures, each nucleotide was counted in each feature, and recorded in an

Excel spreadsheet. Bulges took the most priory when determining the features, where stems took

the least. There were a total of 28 features. The spreadsheet was then turned into a features

matrix (Matrix1). This matrix contained the numbers 1-9 for the nucleotides, and for the number

of nucleotides with 10 or more in the feature the alphabetical letter equivalent was used. If the

feature was not found in that particular species structure it was represented by a zero. This

sequence matrix was then given to Fengjie Sun, and a secondary structure phylogenetic tree was

created (Phylogenetic Tree4).

Figure 1

Using the species sequences (Table1) trees, secondary structures, and matrices were created.

Data

The data collected in this research consists of the 30 species sequences and accession number

(Table1).

Species Name and Accession Number

Species Sequence>Oryza_sativa AAK92623 CCCGGAAACCAGCACGGGCGUCGAGGCAACUCUGCGCCCGUGCACGCAU

GUUAAGGGUAAGCGAAAGAAU>Nasonia_vitripennis XP_001606835

AACAGUUACUGAUACCGAAACACGGGCGUCGAGGCAACUCUACGCCCGUGUGCUUAUGUCAAGGGUAAUCCAGAAUGG

>Cucumbis_ sativus BAB19275 UGCAGGGUAAUUAAUCGGCACGGGCGUAGGAGUAUUCUCCACGCCCGUGCUCGCAUGUUAAGGGGCUUAAAAAAACC

>E_coli TW10598 AACGGGUAAUUAUACUGACACGGGCGAAGGGGAAUUUCCUCUCCGCCCGUGCAUUCAUCUAGGGGCAAUUUAAAAAAG

>Pantoea_ananatis D90087 GACAGCCGUUGAAACCAGCACGGGCGUAGGAGUUUUCUCCACGCCCGUGCGAUCAUGUUAAGGGCUAAAUAAAUGGC

>Pectobacterium_wasabiae WPP163

UAUUGCUGGCAAUCAGCACGGGCGUCGAGGAAACUCUGCGCCCGUGCACGCAUGUUGAGGGGCAGGAAAAGAAAU

>Proteus_panneri AF324468 AAUAGUUACUGAUACCGAACACGGGCGUCGAGGAAACUCAACGCCCGUGUGCUUAUGUCAAGGGUAAUCUAGAAUGG

>Salmonella_bongori FR877557.1

AACGGGUAAUUACUGGCACGGGCGAAGAGGUUUCCUCUCCGCCCGUGCAUGCAUGUUAAGGGCAGAUAAAAAAAG

>Baphidicold_Str UAUAGACUUUAAACUAGCACGGGCGUAGAUAAACuUCUGCGCCCGUGUAAUUAUUUUUAAAUCAGGUAGAGUAAG

>Klebsiella_pneumoniae CP000964

AACGGGUAAAACACUGGCACGGGCGAAGAGGUUUcCUCUCCGCCCGUGCACGCAUGUUAAGGGGCUGAAAAACAC

>Escherichia_fergusonii AEVY01000001.1

AACGGGUAAUUAUACUGACACGGGCGAAGAGGaaUUUcCUCUCCGCCCGUGCAUUCAUCUAGGGGCAAAUCAAAAAAG

>Enterobacter_sp 638 UGCAGGGUAAUUAAUCGGCACGGGCGAAGAGGUUUcCUCUCCGCCCGUGCUCGCAUGUUAAGGGGCUGAAAAAACCA

>Cronobacter_sakazakii BAA-894

CAGGGUAGAUAACACUGGCACGGGCGUAGGAGUUUcUCCACGCCCGUGCACGCAUGUCAGGGGCAGAUAAAACAUG

>Erwinia_tasmaniensis AAUUAACCAGCACGGGCGUAGGAUUUUaUCCACGCCCGUGCACGCAUGUUAAGGGCAGGAUCAAUAAC

>Photorhabdus_luminescens BX571860.1

AACUGUUGcuGAUUAACCGGCACGGGCGUUGAGGGAACUCUGCGCCCGUGCGCGCAUGUUAAGGGUAAAAUAAGAGAU

>Sodalis_glossinidius AACAGGCACGGGCGUCGGGGAAAcUCUGCGCCCGUGCAAGCAUGUUGAGGGCAGGAAGAGACAA

>Pectobacterium_carotovorum PBR1692

AAUCAGCACGGGCGUCGAGGAAACUCUACGCCCGUGCACGCAUGUUGAGGGGCAGGAAAAGAAA

>Pectobacterium_atrosepticum SCRI1043

AAUCAGCACGGGCGUCGAGGAAACUCUGCGCCCGUGCACGCAUGUUgAAGGGCAGGAAAAGAAAU

>Klebsiella_pneumoniae 342 AACGGGUAAAACACUGGCACGGGCGAAGAGGUUUcCUCUCCGCCCGUGCACGCAUGUUAAGGGGCUGAAAAACAC

>Citrobacter_koseri CP000822.1

AACGGGUAAAAUACUGGCACGGGCGAAGAGGGUUcCUCUCCGCCCGUGCACGCAUGUUAGGGGCAGAUAAAAACAA

>Salmonella_enterica AACGGGUAAUUACUGGCACGGGCGAAGAGGUUUcCUCUCCGCCCGUGCAUGCAUGUUAAGGGCAGAUAAAAAGAG

>Escherichia_albertii TWO 7627

AACGGGUAAUUAUACUGACACGGGCGAAGGGGauUUUcCUCUCCGCCCGUGCAUUCAUCCAGGGGCAAAUAAAAAGAG

>Providencia_stuartii 25827 AAUGGUUACUGAUACCgAACACGGGCGUCGAGGAAACUCCACGCCCGUGUCGGCAUGUUAAGGGUAAAAUAAAUGGC

>Proteus_penneri 35198 AAUAGUUACUgAUACCGAACACGGGCGUCGAGGAAACUCAACGCCCGUGUGCUUAUGUCAAGGGUAAUCUAGAAUGG

>Proteus_mirabilis AACAGUUACUGAUACCGaaACACGGGCGUCGAGGCAACUCUACGCCCGUGUGCUUAUGUCAAGGGUAAUCCAGAAUGG

>Buchnera_aphidicola AUUAGAGUAUuUAAACCAGCACGGGCGUAGAAAAAAUCUACGCCCGUGUACUUAUUUUGAAAGCAGGAAGAAUAGC

>Providencia_alcalifaciens DSM 30120

GACUGAUACCGGACACGGGCGUUGAGGGAACUCUGCGCCCGUGUCUGCAUGUUAAGGGAAAAAUAAAUGGC

>Serratia_proteamaculans 568 UACCGCAAACCAGCACGGGCGUCGAGGCAACUCUACGCCCGUGCACGCAU

GUUAAGGGUUACAGAAAUAAU>Yersinia_pseudotuberculosis IP 31758 CP000720.1

UACUGUUGCUgGAAACCAGCACGGGCGUCGAGGAAACUCUACGCCCGUGCACGCAUGUUAAGGGUAGGAAAAAAGAG

>Citrobacter_youngae 29220 UGCAGGGUAGUUAAUCGGCACGGGCGUAGAGGUUUcCUCUUCGCCCGUGCUUGUAUGUUAAGGGGCUUAAAAAAAC

Table 1

Table 1 shows the species name and its accession number. The sequence was found on Rfam and there are a total of 30 sequences.

Results

Phylogenetic Tree1

Phylogenetic Tree1 is a Neighbor Joining tree that is created with the distance between pairs of taxa.

Phylogenetic Tree2

Phylogenetic Tree2 is a Maximum Likelihood tree was created based on the probable evolutionary history.

Phylogenetic Tree3

Phylogenetic Tree3 is a Maximum Parsimony tree based off of the least amount of mutations.

These phylogenetic trees show that the eukaryotic domain was not maintained. The four eukaryotic species are

spread throughout the tree, and none of the species have a rooted ancestor. The species Cucumbus sativius is rooted

the farest from the other four domains in each tree created.

Figure 2

Figure1 is an example of the secondary structure of Oryza sativa generated by Vienna RNA Web services.

Species Name Matric Features

Oryza_sativa 03000020247A001130000400000D

Nasonia_vitripennis 4A030000055A0011300004000009

Cucumbis_Sativus 1200000004440000G5411400000B

A_nasoniea 550000000464100034400400000A

E_coli 230000000008001160000300000R

Pantoea_ananatis 34001100457E010000000400000B

Pectobacterium_wasabiae 29020000045A001130000400000A

Proteus_panneri 2C030000045A0011300004000006

Salmonella_bongori 130000000009001150000313900A

Klebsiella_pneumoniae 1300000004640012B00013000008

Baphidicold_StrG D0000000000000000000627A001

Escherichia_fergusonii 230000000008001160000300000R

Enterobacter_sp 23000000054G2011500003000009

Cronobacter_sakazakii DE1000003000000000000354300B

Erwinia_tasmaniensis 7E1000000000000000000300000R

Photorhabdus_luminescens 45002111H003000000000400000A

Sodalis_glossinidius 5A0011000000000000000454D001

Pectobacterium_carotovorum 5A0011003000000000000454400A

Pectobacterium_atrosepticum 5A0011003000000000000454400B

Citrobacter_koseri 2300000034640012B00013000008

Salmonella_enterica 130000003009001150000313900A

Escherichia_albertii 230000000008001160000300000R

Providencia_stuartii 5m3211000000000000000400000A

Proteus_penneri 2C000200045A0011300004000006

Proteus_mirabilis 4A010000055A0011300004000009

Buchnera_aphidicola 12000000093D000000000500000G

Providencia_alcalifaciens 35000000005D001130000400000H

Serratia_proteamaculans 16020000050A001130000400030C

Yersinia_pseudotuberculosis 040011003009000090000400009B

Citrobacter_youngae 5300001050340011900000000309

Matrix1

Matrix1 consists of the secondary features for each species.

Phylogenetic Tree4

Phylogenetic Tree4 was made from the matrix secondary features (Matrix1).

This secondary tree shows the four eukaryotic species much closer together than the primary tree. However, the domains, even though closer, were not maintained.

Discussions

The primary phylogenetic structure trees show that the two domains were not maintained

throughout the trees. The four eukaryotic species were spread throughout the trees with

Cucumbis sativus be the most apart from the other species. The prediction on the domains, based

on the work done by Fengjie Sun, were the prokaryotic species would be rooted in a separate

taxa that the four eukaryotic. The different rooted taxas would show that these domains evolved

at different times. However, the trees created in this research did not show the different

evolutionary distinctions, so there was experimental error. When more research was done on the

molecule tpke11, it was found that when using the microarray technique many of the sequences

found were false positive (Griffiths-Jones & Bateman, 439-441). This means that the microarray

showed a gene for translation that might not actually be present. More research needs to be

looked into each sequence, and only species that have this gene should be used for phylogenetic

analysis.

The secondary structure tree show the four eukaryotic species clustered closer together.

However, there is still no distinction between the two domains. Secondary structures and shape

relate to the function of the molecule. The species could be closer together on the tree because

their functions in eukaryotic species are different than the functions of the tpke11 molecules in

prokaryotic species. The molecule tpke11 could be for regulation in both domains, but this

molecule could regulate different gene expression for each taxas. This shows that the eukaryotic

evolved at a different time than the prokaryotic species which correlates with the research done

by Fengjie Sun and Caetano-Anollés.

Conclusion

In this research study the phylogenetic evolution of tpke11 was analyzed using primary

and secondary structures. The phylogenetic trees were created using award-winning methods,

and primary and secondary trees were created. The primary trees showed that the prokaryotic

and eukaryotic species did not develop independently. However, the secondary structure tree

showed that the eukaryotic species could have evolved much more independently from the

prokaryotic species. This has to do with the shape of the molecule. The shape relates to function,

and showed that the eukaryotic species evolved differently that most prokaryotic species

analyzed in this research. The process in this study was reputable with the tpke11 using methods

in “Evolved RNA Secondary Structure and the Rooting of the Universal Tree of Life”. Future

work that needs to be done. The species need to be analyzed by each individual sequence to see

if there is any mistakes that correlate to the skewed primary trees. Only species that are

confirmed with multiple laboratory techniques should be analyzed. A molecule with all domains

would be favorable to see how all three domains are connected to their phylogenetic structures.

Lastly, more than 30 species will need to be analyzed so that a wider span of nucleotide can be

studied and the domains notated.

References

Griffiths-Jones, S, & Bateman, Rfam Database website, A Nucl. Acids Res. 31 (1), p. 439-441, September 2002.

Fengjie, S, & Caetano-Anollés, G 2009, 'The Evolutionary History of the Structure of 5S Ribosomal RNA', Journal Of Molecular Evolution, 69, 5, p. 430, Advanced Placement Source, EBSCOhost, viewed 29 November 2014.

Hershberg, Ruth, Shoshy Altuvia, and Hanah Margalit. "A Survey Of Small RNA-Encoding Genes In Escherichia Coli." Nucleic Acids Research 31.7 (2003): 1813-1820. MEDLINE with Full Text. Web. 20 Nov. 2014.

Pollock, DD 2003, 'The Zuckerkandl Prize: Structure and Evolution', Journal Of Molecular Evolution, 56, 4, pp. 375-376, Academic Search Complete, EBSCOhost, viewed 29 November 2014.

Zeng, Q, & Sundin, G 2014, 'Genome-wide identification of Hfq-regulated small RNAs in the fire blight pathogen Erwinia amylovora discovered small RNAs with virulence regulatory function', BMC Genomics, 15, p. 414, MEDLINE with Full Text, EBSCOhost, viewed 29 November 2014.

The Zuckerkandl Prize: Structure and Evolution; Journal of Molecular Evolution;Apr2003, Vol. 56 Issue 4, p375

undergrad research paper

Documents

background small rna

binding of rna polymerase

srna family

molecule tpke11

tpke11 family

focus of research

little research

sigma factor rna polymerase