an algorithm for detecting tpp riboswitches in...

1
An algorithm for detecting TPP riboswitches in archaea Department of Biology, Department of Math and Computer Science Denison University, Granville, OH 43023 Motif Sequence Level of conservation Location in sequence GGGG High Towards 5’ end of the riboswitch UGAGA Perfect conservation in all riboswitches No more than 30 bases from GGGG CCCU Fair, some point mutations observed Usually same distance away from TGAGA as GGGG is from UGAGA AACCUGA Low, most sequences had point mutations for this motif Usually at the center of the riboswitch AGGGA Fair, some point mutations observed Towards 3’end of the riboswitch Riboswitch mediated gene regulation by a) prevention of translation initiation b) prevention of proper splicing and c) premature transcription termination. Characteristic secondary structure of the TPP riboswitch in the presence and absence of TPP Secondary structures of the riboswitches predicted in K. cryptofilum (a, b) and C.maquilingensis (c). Source genome of hit fRNAdb match? Nearby proteins E. coli Yes 3’ side: thiamin biosynthesis protein thiC E. coli Yes 3’ side: thiaminebinding periplasmic protein precursor E. coli Yes 3’ side: hydroxyethylthiazole kinase A.Thaliana Yes Within an open reading frame. Gene regulaAon via splicing T. volcanium Yes 3’ side: Major facilitator superfamily permease T.volcanium Yes 3’ side: Major facilitator superfamily permease T.acidophilum Yes 3’ side: Major facilitator superfamily permease T.acidophilum Yes 3’ side: caAonic amino acid transporter related protein K. cryptofilum None HypotheAcal protein Kcr_0861 Sequence is part of the coding region although it does not code for any conserved protein domain. Sequence is located near 3’end of the coding region K. cryptofilum None 3’ side: permease for cytosine uracil thiamine allantoin C. maquilingensis None 3’ side: Nucleoside diphosphate kinase References • Miranda-Rios J, Navarro M and Soberon M. 2001. A conserved RNA structure (THI box) is involved in regulation of thiamin biosynthetic gene expression in bacteria. Proc. Natl. Acad. Sci. USA. 98: 9736 – 9741. • Nudler E and Mironov AS. 2004. The riboswitch control of bacterial metabolism. Trends in Biochemical Sciences. 29(1): 11 – 17. • Serganov A, Polonskaia A, Phan AT, Breaker RR and Patel DJ. 2006. Structural basis for gene regulation by a thiamin pyrophosphate-sensing riboswitch. Nature. 441: 1167 – 1171. • Winkler WC and Breaker RR. 2003. Genetic control by metabolite binding riboswitches. Chembiochem. 4:1024 – 23. Introduction Riboswitches are short sequences of non-coding RNA (100-200nt in length) that are located in the UTRs of genes. Riboswitches consist of highly specialized aptamer regions which recognize and bind to specific metabolites (Winkler and Breaker, 2003). The TPP riboswitch binds to thiamin pyrophosphate. Upon binding to a metabolite, the riboswitch changes its structural conformation, which results in regulation of gene expression (Nudler & Mironov, 2004). The TPP Riboswitch Has the characteristic structure displayed below (Miranda-Rios, 2001; Serganov et al, 2006). Has a motif sequence UGAGA conserved 97% of the time. Has been detected in the genomes of all three domains of life; but only in two archaea species of the order Thermoplasmatales (Miranda-Rios et al, 2001). Step 1: Identify motifs in TPP riboswitches . • Obtained sequences of 355 TPP riboswitches from fRNA Database. • Performed multiple sequence alignment using ClustalX2. • Identified six highly conserved motif sequences. Step 2: Fragment whole genome of target for scanning. • Fragments are 700nt with a 200nt overlap. • Each snippet will be scanned for the motif sequences. Step 3: Modified Smith-Waterman algorithm . • Find all alignments in each fragment to motifs with score above threshold. Step 4: Infer the best sequence of motifs. • Dynamic programming algorithm determines the sequence of individual motifs in each fragment that results in the best total score. Step 5: Predict secondary structure and function . • Folded using the RNAfold server and then compared with the characteristic structure. • Putative riboswitches have a strong resemblance to the characteristic structure. • Nearby genes were determined using NCBI BLAST and the UCSC Genome Browser. Stem Loop Junction Methods Possible new methods of gene regulation in K. cryptofilum : • One of the two predictions in the K. cryptofilum genome was found to be located within an ORF. • No information is available about whether the ORF is actually a gene. • No information is available about possible introns in the coding region. • Novel method of gene regulation may be employed here, such as ribosome shunting. Generalizing the algorithm : • Success of the devised algorithm suggests that it is possible to apply it to other kinds of riboswitches. • Possible to prime the algorithm with motif sequences and characteristic structures of other riboswitches in a similar method. • Further improvements to be made to the algorithm include incorporating a more efficient means of comparing secondary structure than the one employed here as well as automating the detection of motif sequences in known riboswitch sequences. Results Putative TPP riboswitches predicted by the algorithm Testing New Discussion TPP riboswitch conformation with and without TPP Testing on known riboswitches: • Tested on genomes known to possess at least one TPP riboswitch. • Detected all known TPP riboswitches in each genome. • Secondary structures of predicted riboswitches were similar to the characteristic structure. Scanning other archaea : • Executed algorithm on the genomes of 12 archaea species other than those of the order Thermoplasmatales. • Three putative riboswitches detected from genomes of Caldivirga maquilingensis and Korarchaeum cryptofilum. Chinmoy I.S. Bhatiya Jessen T. Havill Jeffrey S. Thompson [email protected] [email protected] [email protected]

Upload: ngohuong

Post on 04-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

An algorithm for detecting TPP riboswitches in archaea

Department of Biology, Department of Math and Computer Science Denison University, Granville, OH 43023

Motif Sequence   Level of conservation   Location in sequence  

GGGG   High  Towards 5’ end of the

riboswitch  

UGAGA  Perfect conservation in all

riboswitches  No more than 30 bases from

GGGG  

CCCU  Fair, some point mutations

observed  

Usually same distance away from TGAGA as GGGG is from

UGAGA  

AACCUGA  Low, most sequences had point

mutations for this motif  Usually at the center of the

riboswitch  

AGGGA  Fair, some point mutations

observed  Towards 3’end of the

riboswitch  

Riboswitch mediated gene regulation by a) prevention of translation initiation b) prevention of proper splicing and c) premature transcription termination.

Characteristic secondary structure of the TPP riboswitch in the presence and absence of TPP

Secondary structures of the riboswitches predicted in K. cryptofilum (a, b) and C.maquilingensis (c).

Source  genome  of  hit   fRNAdb  match?   Nearby  proteins  

E.  coli   Yes   3’  side:  thiamin  biosynthesis  protein  thiC  

E.  coli   Yes   3’  side:  thiamine-­‐binding  periplasmic  protein  precursor  

E.  coli   Yes   3’  side:  hydroxyethylthiazole  kinase  

A.Thaliana   Yes   Within  an  open  reading  frame.  Gene  regulaAon  via  splicing  

T.  volcanium     Yes   3’  side:  Major  facilitator  superfamily  permease  

T.volcanium   Yes   3’  side:  Major  facilitator  superfamily  permease  

T.acidophilum   Yes   3’  side:  Major  facilitator  superfamily  permease  T.acidophilum   Yes   3’  side:  caAonic  amino  acid  transporter  related  protein  K.  cryptofilum   None   HypotheAcal  protein  Kcr_0861  

Sequence  is  part  of  the  coding  region  although  it  does  not  code  for  any  conserved  protein  domain.    Sequence  is  located  near  3’end  of  the  coding  region  

K.  cryptofilum   None   3’  side:  permease  for  cytosine  uracil  thiamine  allantoin  

C.  maquilingensis   None   3’  side:  Nucleoside  diphosphate  kinase  

References  

•  Miranda-Rios J, Navarro M and Soberon M. 2001. A conserved RNA structure (THI box) is involved in regulation of thiamin biosynthetic gene expression in bacteria. Proc. Natl. Acad. Sci. USA. 98: 9736 – 9741. •  Nudler E and Mironov AS. 2004. The riboswitch control of bacterial metabolism. Trends in Biochemical Sciences. 29(1): 11 – 17. •  Serganov A, Polonskaia A, Phan AT, Breaker RR and Patel DJ. 2006. Structural basis for gene regulation by a thiamin pyrophosphate-sensing riboswitch. Nature. 441: 1167 – 1171. •  Winkler WC and Breaker RR. 2003. Genetic control by metabolite binding riboswitches. Chembiochem. 4:1024 – 23.

Introduction  

  Riboswitches are short sequences of non-coding RNA (100-200nt in length) that are located in the UTRs of genes.

  Riboswitches consist of highly specialized aptamer regions which recognize and bind to specific metabolites (Winkler and Breaker, 2003).

  The TPP riboswitch binds to thiamin pyrophosphate.

  Upon binding to a metabolite, the riboswitch changes its structural conformation, which results in regulation of gene expression (Nudler & Mironov, 2004).

The TPP Riboswitch

  Has the characteristic structure displayed below (Miranda-Rios, 2001; Serganov et al, 2006).

  Has a motif sequence UGAGA conserved 97% of the time.

  Has been detected in the genomes of all three domains of life; but only in two archaea species of the order Thermoplasmatales (Miranda-Rios et al, 2001).

Step 1: Identify motifs in TPP riboswitches.

•  Obtained sequences of 355 TPP riboswitches from fRNA Database.

•  Performed multiple sequence alignment using ClustalX2.

•  Identified six highly conserved motif sequences.

Step 2: Fragment whole genome of target for scanning.

•  Fragments are 700nt with a 200nt overlap.

•  Each snippet will be scanned for the motif sequences.

Step 3: Modified Smith-Waterman algorithm.

•  Find all alignments in each fragment to motifs with score above threshold.

Step 4: Infer the best sequence of motifs.

•  Dynamic programming algorithm determines the sequence of individual motifs in each fragment that results in the best total score.

Step 5: Predict secondary structure and function.

•  Folded using the RNAfold server and then compared with the characteristic structure.

•  Putative riboswitches have a strong resemblance to the characteristic structure.

•  Nearby genes were determined using NCBI BLAST and the UCSC Genome Browser.

Stem

Loop Junction

Methods  

Possible new methods of gene regulation in K. cryptofilum:

•  One of the two predictions in the K. cryptofilum genome was found to be located within an ORF.

•  No information is available about whether the ORF is actually a gene.

•  No information is available about possible introns in the coding region.

•  Novel method of gene regulation may be employed here, such as ribosome shunting.

Generalizing the algorithm:

•  Success of the devised algorithm suggests that it is possible to apply it to other kinds of riboswitches.

•  Possible to prime the algorithm with motif sequences and characteristic structures of other riboswitches in a similar method.

•  Further improvements to be made to the algorithm include incorporating a more efficient means of comparing secondary structure than the one employed here as well as automating the detection of motif sequences in known riboswitch sequences. Results  

Putative TPP riboswitches predicted by the algorithm

Testing

New

Discussion  

TPP riboswitch conformation with and without TPP

Testing on known riboswitches:

•  Tested on genomes known to possess at least one TPP riboswitch.

•  Detected all known TPP riboswitches in each genome.

•  Secondary structures of predicted riboswitches were similar to the characteristic structure.

Scanning other archaea:

•  Executed algorithm on the genomes of 12 archaea species other than those of the order Thermoplasmatales.

•  Three putative riboswitches detected from genomes of Caldivirga maquilingensis and Korarchaeum cryptofilum.

Chinmoy I.S. Bhatiya Jessen T. Havill Jeffrey S. Thompson [email protected] [email protected] [email protected]