riboswitches: the oldest regulatory system?

42
Riboswitches: the oldest regulatory system? Mikhail Gelfand December 2004

Upload: donald

Post on 31-Jan-2016

37 views

Category:

Documents


1 download

DESCRIPTION

Riboswitches: the oldest regulatory system?. Mikhail Gelfand December 2004. Riboflavin biosynthesis pathway. 5 ’ UTR regions of riboflavin genes from various bacteria. Conserved secondary structure of the RFN-element. Capitals: invariant (absolutely conserved) positions. - PowerPoint PPT Presentation

TRANSCRIPT

Riboswitches: the oldest regulatory system?

Mikhail Gelfand

December 2004

Riboflavin biosynthesis pathway

ribAribA

ribA ribB

G TP cyclohydrolase II

ribD

ribD

ribG

ribG

P yrim id ine deam inase

3,4-D HB P synthase P yrim id ine reductase

ribHribH R ibo flavin synthase, -cha in

ribEribB

ypaA

R ibo flavin synthase, -chain

GTP

2,5-diam ino-6-hydroxy-4-(5`-phosphoribosylamino)pyrim idine

ribulose-5-phosphate

PENTOSE-PHOSPHATE PATHWAY

PU RINE BIO SYNTHESIS PATHWAY

3,4-dihydroxy-2-butanone-4-phosphate 5-am ino-6-(5`-phosphoribitylam ino)uracil

5-am ino-6-(5`-phosphoribosylamino)uracil

6,7-dimethyl-8-ribityllumazine

Riboflavin

5’ UTR regions of riboflavin genes from various bacteria 1 2 2’ 3 Add. 3’ Variable 4 4’ 5 5’ 1’ =========> ==> <== ===> -><- <=== -> <- ====> <==== ==> <== <========= BS TTGTATCTTCGGGG-CAGGGTGGAAATCCCGACCGGCGGT 21 AGCCCGTGAC-- 8 4 8 -----TGGATTCAGTTTAA-GCTGAAGCCGACAGTGAA-AGTCTGGAT-GGGAGAAGGATGAT BQ AGCATCCTTCGGGG-TCGGGTGAAATTCCCAACCGGCGGT 19 AGTCCGTGAC-- 8 5 8 -----TGGATCTAGTGAAACTCTAGGGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGGATATG BE TGCATCCTTCGGGG-CAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGATCCGGTGCGATTCCGGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGGATGCC HD TTTATCCTTCGGGG-CTGGGTGGAAATCCCGACCGGCGGT 19 AGTCCGTGAC-- 10 4 10 ----–TGGACCTGGTGAAAATCCGGGACCGACAGTGAA-AGTCTGGAT-GGGAGAAGGAAACG Bam TGTATCCTTCGGGG-CTGGGTGAAAATCCCGACCGGCGGT 23 AGCCCGTGAC-- 8 4 8 ----–TGGATTCAGTGAAAAGCTGAAGCCGACAGTGAA-AGTCTGGAT-GGGAGAAGGATGAG CA GATGTTCTTCAGGG-ATGGGTGAAATTCCCAATCGGCGGT 2 AGCCCGCAA--- 3 4 3 ------AGATCCGGTTAAACTCCGGGGCCGACAGTTAA-AGTCTGGAT-GAAAGAAGAAATAG DF CTTAATCTTCGGGG-TAGGGTGAAATTCCCAATCGGCGGT 2 AGCCCGCG---- 7 6 7 --------ATTTGGTTAAATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GGAAGAAGATATTT SA TAATTCTTTCGGGG-CAGGGTGAAATTCCCAACCGGCAGT 6 AGCCTGCGAC-- 11 3 11 ----–CTGATCTAGTGAGATTCTAGAGCCGACAGTTAA-AGTCTGGAT-GGGAGAAAGAATGT LLX ATAAATCTTCAGGG-CAGGGTGTAATTCCCTACCGGCGGT 2 AGCCCGCGA--- 4 4 4 -----ATGATTCGGTGAAACTCCGAGGCCGACAGT-AT-AGTCTGGAT-GAAAGAAGATAATA PN AACTATCTTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 2 AGCCCACGA--- 3 4 3 -----ATGATTTGGTGAAATTCCAAAGCCGACAGT-AT-AGTCTGGAT-GAAAGAAGATAAAA TM AAACGCTCTCGGGG-CAGGGTGGAATTCCCGACCGGCGGT 3 AGCCCGCGAG-- 5 4 5 ----–TTGACCCGGTGGAATTCCGGGGCCGACGGTGAA-AGTCCGGAT-GGGAGAGAGCGTGA DR GACCTCTTTCGGGG-CGGGGCGAAATTCCCCACCGGCGGT 15 AGCCCGCGAA-- 8 12 9 ----–CCGATGCCGCGCAACTCGGCAGCCGACGGTCAC-AGTCCGGAC-GAAAGAAGGAGGAG TQ CACCTCCTTCGGGG-CGGGGTGGAAGTCCCCACCGGCGGT 3 AGCCCGCGAA-- 5 4 5 -----CCGACCCGGTGGAATTCCGGGGCCGACGGTGAA-AGTCCGGAT-GGGAGAAGGAGGGC AO AATAATCTTCAGGG-CAGGGTGAAATTCCCGATCGGCGGT 2 AGTCCGCGA--- 7 7 7 -----AGGAACCGGTGAGATTCCGGTACCGACAGT-AT-AGTCTGGAT-GGAAGAAGATGAAA DU TTTAATCTTCAGGG-CAGGGTGAAATTCCCGATCGGTGGT 2 AGTCCGCGA--- 13 4 12 -----AGGAACTAGTGAAATTCTAGTACCGACAGT-AT-AGTCTGGAT-GGAAGAAGAGCAGA CAU GAAGACCTTCGGGG-CAAGGTGAAATTCCTGATCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGACCCGGTGTGATTCCGGGGCCGACGGT-AT-AGTCCGGAT-GGGAGAAGGTCGGC FN TAAAGTCTTCAGGG-CAGGGTGAAATTCCCGACCGGTGGT 2 AGTCCACG---- 5 4 5 -------GATTTGGTGAAATTCCAAAACCGACAGT-AG-AGTCTGGAT-GGGAGAAGAATTAG TFU ACGCGTGCTCCGGG-GTCGGTGAAAGTCCGAACCGGCGGT 3 AGTCCGCGAC-- 8 5 8 -----TGGAACCGGTGAAACTCCGGTACCGACGGTGAA-AGTCCGGAT-GGGAGGTAGTACGTG SX -AGCGCACTCCGGG-GTCGGTGAAAGTCCGAACCGGCGGT 3 AGTCCGCGAC-- 8 5 8 -----TTGACCAGGTGAAATTCCTGGACCGACGGTTAA-AGTCCGGAT-GGGAGGCAGTGCGCG BU GTGCGTCTTCAGGG-CGGGGTGAAATTCCCCACCGGCGGT 30 AGCCCGCGAGCG 137 GTCAGCAGATCTGGTGAGAAGCCAGAGCCGACGGTTAG-AGTCCGGAT-GGAAGAAGATGTGC BPS GTGCGTCTTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 21 AGCCCGCGAGCG 8 4 8 GTCAGCAGATCTGGTCCGATGCCAGAGCCGACGGTCAT-AGTCCGGAT-GAAAGAAGATGTGC REU TTACGTCTTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 31 AGCCCGCGAGCG 7 5 7 GTCAGCAGATCTGGTGAGAGGCCAGGGCCGACGGTTAA-AGTCCGGAT-GAAAGAAGATGGGC RSO GTACGTCTTCAGGG-CGGGGTGGAATTCCCCACCGGCGGT 21 AGCCCGCGAGCG 11 3 11 GTCAGCAGATCCGGTGAGATGCCGGGGCCGACGGTCAG-AGTCCGGAT-GGAAGAAGATGTGC EC GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 17 AGCCCGCGAGCG 8 4 8 GACAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAG-AGTCCGGAT-GGGAGAGAGTAACG TY GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 67 AGCCCGCGAGCG 8 3 8 GTCAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAA-AGTCCGGAT-GGGAGAGGGTAACG KP GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 20 AGCCCGCGAGCG 8 4 8 GTCAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAA-AGTCCGGAT-GGGAGAGAGTAACG HI TCGCATTCTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 2 AGCCCACGAGCG 26 9 30 GTCAGCAGATTTGGTGAAATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GAAAGAGAATAAAA VK GCGCATTCTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 14 AGCCCACGAGCG 11 9 11 GTCAGCAGATTTGGTGAGAATCCAAAGCCGACAGT-AT-AGTCTGGAT-GAAAGAGAATAAGC VC CAATATTCTCAGGG-CGGGGCGAAATTCCCCACCGGTGGT 13 AGCCCACGAGCG 5 4 5 GTCAGCAGATCTGGTGAGAAGCCAGGGCCGACGGTTAC-AGTCCGGAT-GAGAGAGAATGACA YP GCTTATTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 40 AGCCCGCGAGCG 16 6 16 GTCAGCAGACCCGGTGTAATTCCGGGGCCGACGGTTAT-AGTCCGGAT-GGGAGAGAGTAACG AB GCGCATTCTCAGGG-CAGGGTGAAAGTCCCTACCGGTGGT 25 AGCCCACGAGCG 16 4 27 GTCAGCAGATTTGGTGCGAATCCAAAGCCGACAGTGAC-AGTCTGGAT-GAAAGAGAATAAAA BP GTACGTCTTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 18 AGCCCGCGAGCG 10 4 10 GTCAGCAGACCTGGTGAGATGCCAGGGCCGACGGTCAT-AGTCCGGAT-GAGAGAAGATGTGC AC ACATCGCTTCAGGG-CGGGGCGTAATTCCCCACCGGCGGT 16 AGCCCGCGAGCA 10 3 11 ---CGCAGATCTGGTGTAAATCCAGAGCCGACGGT-AT-AGTCCGGAT-GAAAGAAGACGACG Spu AACAATTCTCAGGG-CGGGGTGAAACTCCCCACCGGCGGT 34 AGCCCGCGAGCG 6 6 6 GTCAGCAGATCTGGTG 52 TCCAGAGCCGACGGT 31 AGTCCGGAT-GGAAGAGAATGTAA PP GTCGGTCTTCAGGG-CGGGGTGTAAGTCCCCACCGGCGGT 13 AGCCCGCGAGCG 7 3 7 GTCAGCAGATCTGGTGCAACTCCAGAGCCGACGGTCAT-AGTCCGGAT-GAAAGAAGGCGTCA AU GGTTGTTCTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 17 AGCCCGCGAGCG 7 9 7 GTCAGCAGATCCGGTGAGAGGCCGGAGCCGACGGT-AT-AGTCCGGAT-GGAAGAGGACAAGG PU AAACGTTCTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 19 AGCCCGCGAGCG 19 4 18 GTCAGCAGACCCGGTGTGATTCCGGGGCCGACGGTCAC-AGTCCGGATGAAGAGAGAACGGGA PY TAACGTTCTCAGGG-CGGGGTGCAACTCCCCACCGGCGGT 19 AGCCCGCGAGCG 15 4 16 GTCAGCAGACCCGGTGTGATTCCGGGGCCGACGGTCAT-AGTCCGGATGAAGAGAGAGCGGGA PA TAACGTTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 19 AGCCCGCGAGCG 14 4 13 GTCAGCAGACCCGGTGCGATTCCGGGGCCGACGGTCAT-AGTCCGGATAAAGAGAGAACGGGA MLO TAAAGTTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 16 AGCCCGCGAGCG 8 5 8 GTCAGCAGATCCGGTGTGATTCCGGAGCCGACGGTTAG-AGTCCGGAT-GAAAGAGGACGAAA SM AAGCGTTCTCAGGG-CGGGGTGAAATTCCCCACCGGCGGT 34 AGCCCGCGAGCG 8 3 8 GTCAGCAGATCCGGTCGAATTCCGGAGCCGACGGTTAT-AGTCCGGAT-GGAAGAGAGCAAGC BME GCTTGTTCTCGGGG-CGGGGTGAAACTCCCCACCGGCGGT 17 AGCCCGCGAGCG 10 15 10 GTCAGCAGATCCGGTGAGATGCCGGAGCCGACGGTTAA-AGTCCGGAT-GGAAGAGAGCGAAT BS ATCAATCTTCGGGG-CAGGGTGAAATTCCCTACCGGCGGT 18 AGCCCGCGA--- 5 4 5 -----AGGATTCGGTGAGATTCCGGAGCCGACAGT-AC-AGTCTGGAT-GGGAGAAGATGGAG BQ GTCTATCTTCGGGG-CAGGGTGAAAATCCCGACCGGCGGT 27 AGCCCGCGA—-- 3 5 3 -----AGGATTTGGTGTGATTCCAAAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGGAG BE ATTCATCTTCGGGG-CAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGATCCGGTGCGAGTCCGGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGAAG CA AATGATCTTCAGGG-CAGGGTGAAATTCCCTACCGGCGGT 2 AGCCCGCGAG-- 3 4 3 ----TATGATCCGGTTTGATTCCGGAGCCGACAGT-AA-AGTCTGGAT-GAAAGAAGATATAT DF GAAGATCTTCGGGG-CAGGGTGAAATTCCCTACCGGCGGT 2 AGCCCGCG---- 6 4 6 -------GATTTGGTGAGATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GAGAGAAGATATTT EF GTTCGTCTTCAGGGGCAGGGTGTAATTCCCGACCGGTGGT 3 AGTCCACGAC-- 5 3 5 ----ATTGAATTGGTGTAATTCCAATACCGACAGT-AT-AGTCTGGAT—-AAAGAAGATAGGG LLX AAATATCTTCAGGG-CACCGTGTAATTCGGGACCGGCGGT 21 ACTCCGCGAT-- 4 4 4 ----–TTGAAGCAGTGAGAATCTGCTAGCGACAGT-AA-AGTCTGGAT-GGAAGAAGATGAAC LO GTTCATCTTCGGGG-CAGGGTGCAATTCCCGACCGGTGGT 3 AGTCCACGAT-- 3 10 3 ----TTGACTCTGGTGTAATTCCAGGACCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGTTG PN AAGAGTCTTCAGGG-CAGGGTGAAATTCCCGACCGGCGGT 125 AGTCCGTG---- 3 4 3 -------GATGTGGTGAGATTCCACAACCGACAGT-AT-AGTCTGGAT-GGGAGAAGACGAAA ST AAGTGTCTTCAGGG-CAGGGTGTGATTCCCGACCGGCGGT 14 AGTCCGCG---- 3 4 3 -------GATGTGGTGTAACTCCACAACCGACAGT-AT-AGTCTGGAT-GAGAGAAGACCGGG MN AAGTGTCTTCAGGG-CAGGGTGAGATTCCCGACCGGCGGT 104 AGTCCGCG---- 3 4 3 -------GATGTGGTGAAATTCCACAACCGACAGT-AA-AGTCTGGAT-GGGAGAAGACTGAG SA ATTCATCTTCGGGG-TCGGGTGTAATTCCCAACCGGCAGT 6 AGCCTGCGAC-- 11 3 11 ----–CTGATCTAGTGAGATTCTAGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGGAG AMI TCACAGTTTCAGGG-CGGGGTGCAATTCCCCACTGGCGGT 14 AGCCCGCGC--- 5 5 5 ------TGATCTGGTGCAAATCCAGAGCCAACGGT-AT-AGTCCGGAT-GGAAGAAACGGAGC DHA ACGAACCTTCGAGG-TAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCAAC-- 11 4 11 --CGACTGACTTGGTGAGACTCCAAGGCCGACGGT-AT-AGTCCGGAT-GGGAGAAGGTACAA FN AATAATCTTCGGGG-CAGGGTGAAATTCCCGACCGGTGGT 2 AGTCCACG---- 4 6 4 -------GATTTGGTGAAATTCCAAAACCGACAGT-AG-AGTCTGGAT-GAGAGAAGAAAAGA GLU ---TGTTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 28 AGCCCGCGAGCG 10 4 10 GTCAGCAGATCCGGTTAAATTCCGGAGCCGACGGTCAT-AGTCCGGAT-GCAAGAGAACC---

Conserved secondary structure of the RFN-element

NNNNyYYUC

NNNNrRRAG

NgGGNcCC

rgGGxc

ARRgxuAG

GRCCYG

AcCG

AGCCRGY

GG YRCC

GRYBy CYRVrG N

YGNaA N U U x N

Nx

AGU

UrN A g

Y

variab lestem -loop

additionalstem -loop

3 4

2

1

5

5 ’ 3 ’

u K NRA

xK

*

****

Capitals: invariant (absolutely conserved) positions.

Lower case letters: strongly conserved positions.

Dashes and stars: obligatory and facultative base pairs

Degenerate positions: R = A or G; Y = C or U; K = G or U; B= not A; V = not U. N: any nucleotide. X: any nucleotide or deletion

Attenuation of transcription

TerminatorThe RFN element

Antiterminator

Antiterminator

Bam GACAAAAAAATATTGATTGTATCCTTCGGGGCTGGGTG --- TCTGGATGGGAGAAGGATGA 59 ----------GTAAAGCCCCGAATGTGTAA---ACATTCGGGGCTTTTTGACGCCAAAT BS GGACAAATGAATAAAGATTGTATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGGATGA 59 ----------CTAAAGCCCCGAATTTTTTA--TAAATTCGGGGCTTTTTTGACGGTAAA BQ CTATAATTTGAGCAAACAGCATCCTTCGGGGTCGGGTG --- TCTGGATGGGAGAAGGATAT 250 -----------CCAAACCCCAAGGATATTAAA--ATCCTTGGGGTTTTTTGTTTTTTTT BE ACATAACGATATAGTGATGCATCCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGGATGC 155 ------------TGAGCCCCCGGGGACAT--------CCCGGGGGTTTCATTTTTATTG HD AAATTGAATAATTAATTTTTATCCTTCGGGGCTGGGTG --- TCTGGATGGGAGAAGGAAAC 148 -------------ATGCCCCGTGAGAACAAAA-----TCTCTGGGGCTTTTTTGCGCGC CA TAATGGTAATTTAATAGGATGTTCTTCAGGGATGGGTG --- TCTGGATGAAAGAAGAAATA 34 -------------AATCTCCGAAGGATTACC----TTTCTTTGGAGATTTTTTTATTTG DF TAAATATAAATTTAATACTTAATCTTCGGGGTAGGGTG --- TCTGGATGGAAGAAGATATT 63 ------------TAAACCCTGAGTTAATT--------CTCAGGGTTTTTTGTTTAAAAA LLX ACTTTAGCTACAATTGAATAAATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAT 127 ----------AAAAGACCCTGAAATTTT------ATTTTAGGGTCTTATTTTTTATTAG PN* ATCATCTGTAATTGAATAACTATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAA 81 ----------TGTATGCCTTGAGTAGTCCCC---TATTCAAGGTATATTTTTTTGGAGG PN* ATCATCTGTAATTGAATAACTATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAA 19 ------------CGTGCTCTGAAATGATTACTTGTCATTTCAGAGCATTTTTGTTAATC TM AAAACTGAATACAAAAGAAACGCTCTCGGGGCAGGGTG --- TCCGGATGGGAGAGAGCGTG 13 -----------ATGGGACCCGAGA----------------GGGTCCCTTTTCTTTTACA AO ATTTGCAACAATTTTTTAATAATCTTCAGGGCAGGGTG --- TCTGGATGGAAGAAGATGAA 33 --------TTTACAAGCCTTGAGATCGAAAG----ATTTCAAGGCTTTTTTCATCATTA DU AATTTTTTTAATACTATTTTAATCTTCAGGGCAGGGTG --- TCTGGATGGAAGAAGAAGAG 47 --------TGCATAAGCCTTGAGATCTTAG----GATTTCAAGGCTTTTTCATTAGTTA FN TAATCGAATATGTAAAATAAAGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGAATTA 18 ----------ATATTGCTCAGACTTT------------GTTTGAGCATTTTTTTATTAA SA TATAACAATTTCATATATAATTCTTTCGGGGCAGGGTG --- TCTGGATGGGAGAAAGAATG 74 ------TTTTCTCCTTGCATCTTAATT----------GATGTGAGGATTTTTGTTTATA DHA ACTCTTTTTAGATGAATACGAACCTTCGAGGTAGGGTG --- TCCGGATGGGAGAAGGTACA 43 -----------GTTTATGCCTCGAGGAACACCATTTCCTCGAGGCATTTTTGTTCTTTC FN GAAAAATAAATATTAAAAATAATCTTCGGGGCAGGGTG --- TCTGGATGAGAGAAGAAAAG 40 ------------CTTACCCGAATTCTAT------------AATTCGGTTTTTTTATTTT CA AATATAAAAAAATAAAGAATGATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATATA 19 ----------–-TATGCCCTGACGTTTTT---------CGTTGGGGCTTTTTTAATGCT DF AAAATTAAAAAATCAAAGAAGATCTTCGGGGCAGGGTG --- TCTGGATGAGAGAAGATATT 45 ----------ATAAAAACTCGAAGATAGGG----TCTTCGAGTTTTTTGTTTTTCCTAA BS TAATTAAATTTCATATGATCAATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGGA 103 --AAAGAACCTTTCCGTTTTCGAGTAAGATGTGATCGAAAAGGAGAGAATGAAGTGAAA BQ GGGAAAATAGAATATCGGTCTATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGGA 54 -------ATTCTCCCTTTGTGTAAA------------ACACAAAGGGTTTTTTCGTTCTATG BE ATAAAAATGTATAAGCGATTCATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGAA 114 --------GGCAGCCTTCTTCTTGTGAGGATGAATCACGAGAAGGGGAGGAGAACAAGCATG PN GTTTTTTGTTATGATAAAAGAGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGACGAA 137 -–AACTTCTTCTGATTTTATAG------------AAAATTGGAGGAACCTGTTATGACA ST TAAATCTGCTATGCTAGAAGTGTCTTCAGGGCAGGGTG --- TCTGGATGAGAGAAGACCGG 130 ---GGAACTTCTTTCAATTTGAAA-----------AAATTGGAGGAATTTTTTAATGTC MN ATTTTTTGATATGCTATAAGTGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGACTGA 138 ---–GGCCTTCTTTCGATTTGTAA-----------AAATTGGAGGAATTTTTTTATGAA SA AAATTTAATAATGTAAAATTCATCTTCGGGGTCGGGTG --- TCTGGATGGGAGAAGATGGA 17 --------TCCTCCTATTCTTACG--------AGATGAATGGAAGGAGAAAATTGAATATG EF AAAAAATATAATACAAGGTTCGTCTTCAGGGGCAGGGT --- GTCTGGATAAAGAAGATAGG 33 ---CTACTCTATTTTTCCCTGCAGA------------AAAATAGGGTTTTTTTGTATGA LLX TTTTTGTGCTATAATAAAAATATCTTCAGGGCACCGTG --- TCTGGATGGAAGAAGATGAA 66 -–TCAACTTCCTCGAAATTTGAAGAAT-TATTTTCTCATATTTGGAGGTTTTTTTATGT LO ATTGTAAGAAAATATTCGTTCATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGTTG 79 ---ATGCACAAACTCTCCCTCAACTTTTTTTA--------GTTGAGGTTTTTTATTTGC

Attenuation of translation

EC AATCCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGAGTAACG 59 ----------CTGCCCTGATTCTGGTAACCATAATTTTAGTGAGGTTTTT-------TACCATGAATCAGACGCTA TY AACCCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGGGTAACG 61 ----------CTGCCCTGATTCTGGTAACCATAATGTTAATGAGGTTTTTT------TACCATGAATCAGACGCTA KP ATCTCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGAGTAACG 61 ----------CTGCCCTGATTCTGGTAACCATAATTTTAATGAGGTTTTTT------TACCATGAATCAGACGCTC HI TTAGCTCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAAA 41 ----------CAGCCCTGATTCTGGTATTTAATTGAAATCTCAAAT-TAGGAAAT--TACTATGAATCAGTCAATT VK TATTTGCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAGC 76 ----------CAGCCCTGATTCTGGTATCTAAATATCTTTATATTTCAAGGAATT--TACTATGAATCAGTCTATT AB TAGGCGCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAAA 54 ----------CCGCCCTGATTCTGGTATAAATTCATCTTATTAAA—AAGGCATT---TACTATGAATCAGTCATTA YP ATGGGGCTTATTCTCAGGGCGGGGTG --- TCCGGATGGGAGAGAGTAACG 194 ----------CCGCCCTGATTCTGGTAATCCATAATTTTTTAATGAGGTTTCT---TTACCATGAATCAGACGCTT VC CACAACAATATTCTCAGGGCGGGGCG --- TCCGGATGAGAGAGAATGACA 83 ----------AAGCCCTGATTCTGGTCATTTTTT--------------GGAGTATT--ACCATGAATCAGTCCTCA Spu CTATCAACAATTCTCAGGGCGGGGTG --- TCCGGATGGAAGAGAATGTAA 145 ----------ACGCCCTGATTCTGGATATTCCCATGTCGTATTTTTGAAGGATATTAA-CCATGAATCAGTCTTTA MLO GACGTTAAAGTTCTCAGGGCGGGGTG --- TCCGGATGAAAGAGGACGAAA 44 -------CGTGCGTCCTGATTCTGGTTCGAAACGGA--------------AGGATGGACCCATGAATCAGCATTCC AC AAGCGACATCGCTTCAGGGCGGGGCG --- TCCGGATGAAAGAAGACGACG 51 ----------CAGTCCTGAAATGTTTAACCGTAATT-------------------TACGAGAGCATTTCATATGTC BP AAGCAGTACGTCTTCAGGGCGGGGTG --- TCCGGATGAGAGAAGATGTGC 62 ----------TAGCCCTGAAACGTTTTTCGCCATTTCCTTTTTT------------GCGAGAGCGTTTCAATGTCC BPS AGTCAGTGCGTCTTCAGGGCGGGGCG --- TCCGGATGAAAGAAGATGTGC 86 ----------GAGCCCTGAAACGTTTTTCGCCCATTCATGTTTC-----------GCGAGGAGCGTTTCACATCATG BU AATCAGTGCGTCTTCAGGGCGGGGTG --- GCCGGATGGAAGAAGATGTGC 99 ----------ATGCCCTGAAACGTTTTTCGCCCAACTTTT--------------GCGATGAGCGTTTCAACTATGT REU CATCGTTACGTCTTCAGGGCGGGGTG --- TCCGGATGAAAGAAGATGGGC 77 ----------ATCCCCTGAAACGCCCATCCATGGAAATCCACGCAC-------------GGAGCGTTTCAATGCTG RSO GCTTGGTACGTCTTCAGGGCGGGGTG --- TCCGGATGGAAGAAGATGTGC 80 ---------CGTGCCCTGGAACGTCTTGTCGCCCATTTCA---------------GCGAGGAGCGTTTCCATGTTG PP GGTCGGTCGGTCTTCAGGGCGGGGTG --- TCCGGATGAAAGAAGGCGTCA 50 ----------TCGCCCCGAGACGTTCATCGATCATTCA------------------CGAGGAGCGTTTCATGTTCA PY GCCGGTAACGTTCTCAGGGCGGGGTG --- CCGGATGAAGAGAGAGCGGGA 91 ----------ATGCCCTGTTTTTTCATTAAATT---------------------AAACAGGAGTCAGAACACGTGC PU CGGCGAAACGTTCTCAGGGCGGGGTG --- CCGGATGAAGAGAGAACGGGA 68 ----------ACGCCCTGTTTTTCACAC--------------------------AAACAGGAGTCAGAACATGCAA PA GGCCGTAACGTTCTCAGGGCGGGGTG --- CCGGATAAAGAGAGAACGGG 53 ---------AAAGCCCTGTTTTTCAC---------------------------GAAACAGGAGTTCGTCATATG-- BME CGCGGGCTTGTTCTCGGGGCGGGGTG --- TCCGGATGGAAGAGAGCGAAT 54 ----------GCGCCCTGATTCTAGTTTCGTG--------------------------AGGAACCTATGAACCAAA CAU AATCCGAAGACCTTCGGGGCAAGGTG --- TCCGGATGGGAGAAGGTCGGC 116 ------CGCGATGCCCCGAAGGTGTG-----------------------------TTCAGGGGTGTCGCGATGAAC TFU GTACACACGCGTGCTCCGGGGTCGGT --- GGATGGGAGGTAGTACGTGGT 58 -------GCCTTACCCCGGAGCCTGACCT-------------------------GGCTAGGGGGAAGGCTTCTCGCATG GLU TGAGTTTTGTTCTCAGGGCGGGGCG --- TCCGGATGCAAGAGAACCG 32 ---------AAGGCCCCGAGGATTACATGCTTTTAAATCCTTTGAAAAGGGGACAAGATCATGAATCCTATAACCG DR GAACCGACCTCTTTCGGGGCGGGGCG --- TCCGGACGAAAGAAGGAGGAG 1 GACGCTCAGCTTGCCCCCCA------------------------------------GCAGGCGGCGTCCGCGTATG SM GTCGCAAGCGTTCTCAGGGCGGGGTG --- TCCGGATGGAAGAGAGCAAGC 45 ATCATTGGAAAAATGCCAACCCTGAAA-------------------GGCTTGAGACCATGACCATACTT TQ TTCGGCACCTCCTTCGGGGCGGGGTG --- TCCGGATGGGAGAAGGAGGGCCACTTGCGC AMI CTTACTCACAGTTTCAGGGCGGGGTG --- TCCGGATGGAAGAAACGGAGCGCCTTATGG

SD-sequestorThe RFN element

Antisequestor

RFN: the mechanism of regulation

• Transcription attenuation

• Translation attenuation

Distribution of RFN-elements

Genomes Number of analyzed genomes

Number of genomes with RFN

Number of the RFN elements

α-proteobacteria 8 4 4

β-proteobacteria 7 4 4

γ-proteobacteria 17 15 15

δ- and ε-proteobacteria 3 0 0

Bacillus/Clostridium 12 12 19

Actinomycetes 9 4 4

Cyanobacteria 5 0 0

Other eubacteria 7 5 6

Total 68 47 52

Phylogenetic tree of RFN-elements

YpaA: riboflavin transporter in Gram-positive bacteria

• 5 predicted transmembrane segments => a transporter• Upstream RFN element (likely co-regulation with riboflavin

genes) => transport of riboflaving or a precursor• S. pyogenes, E. faecalis, Listeria sp.: ypaA, no riboflavin

pathway => transport of riboflavinPrediction: YpaA is riboflavin transporter (Gelfand et al., 1999)

Verification:• YpaA transports flavines (riboflavin, FMN, FAD) (by genetic

analysis, Kreneva et al., 2000)• ypaA is regulated by riboflavin (by microarray expression

study, Lee et al., 2001)• … via attenuation of transcription (and to some extent

inhibition of translaition) (Winkler et al., 2003)

More predicted (riboflavin) transporters

impX from Fusobacterium and Desulfitobacterium

– no similarity with any known protein; no homologs in other complete genomes

– 9 predicted TMS

– single RFN-regulated gene

pnuX from Actinomycetes (Corynebacterium, Streptomyces, Thermomonospora)

– no orthologs in other genomes

– 6 predicted TMS

– either a single gene or a part of the riboflavin operon

– regulated by RFN

– similar to the nicotinamide mononucleotide transporter PnuC from E. coli

thi-box and regulation of thiamine metabolism genes by pyrophosphate (Miranda-Rios et al., 2001)

TTCGGGATCCGCGGAACCTGA-TCAGGCTAA-TACCTGCG-AAGGGAACAAGAGTTA THIC_EC TTCGGGATCCGTTGAACCTGA-TCAGGTTAA-TACCTGCG-AAGGGAACAAGAGAAG THIC_VC GCAGTGACCCGTTGAACCTGA-TCCAGTTCA-TACTGGCG-TAGGGACGGTGCAAGC THIC_MLO GCAGTGACCCGTTGAACCTGA-TCCAGTTCA-CACTGGCG-TAGGGACGGTGCAGAC THIC_SM AGAAATACCCTTTACACCCGA-TCGGGATAA-TACCTGCG-TGGGGAGTTTTCACGG THIC_NM TTCTTAACCCTTTGGACCTGA-TCTGGTTCG-TACCAGCG-TGGGGAAGTAGAGGAA thiC_BS CCGTCGACCGTACGAACCTGA--CCGGGTAA-TGCCGGCG-TAGGGAGTTGCAAATG THIC_MT GGATCGACCCTTTGAACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGAAATTATGTCG THIT2_TVO TCCTCGACCCCAAGAACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGATCGGGGAAGG thi1_TM

Notation: Red– Conserved nucleotides; Green– Purine or Pyrimidine conserved nucleotides; Blue– Non-conserved nucleotides

Alignment of THI-elements 1 2 3 3' FACULTATIVE STEM-LOOP 2' 4 5 5' 4' 1' ----====>===> -=====> <===== ========> <======= <=== ===> =====> <===== <=== <====---- BACILLUS/CLOSTRIDIUM GROUP BS_THIC TAGTTACTGGGGGTGCCCGCT----------------TTCcgGGCTGAGAGAGAAGGCA-------------AGCTTCTTAACCCTTT---GGACCTGA-TCTGGTTCG-TACCAGCG-TGGGGA-AGTAGAGGA BS_TENA TAACCACTAGGGGTGTCCTTC----------------ATAAGGGCTGAGATAAAAGTGT-------------GACTTTTAGACCCTCA---TAACTTGA-ACAGGTTCA-GACCTGCG-TAGGGA-AGTGGAGCG BS_YLMB TTCATCCTAGGGGTGCTTTG-------------------CGAAGCTGAGAGAGACTT-----------------TGTCTCAACCCTTT---TGACCTGA-TCTGGATCA-TGCCAGCG-GAGGGA-AGCGGTGAA BS_YKOF AAAGCACTAGGGGTGCTGT--------------------TTTGGCTGAGATAAAGCGCGGAA-----GAAACGCGCTTTGATCCCTTA---TGACCCGA-TCTGGATAA-TACCAGCG-TGGGGA-AGTGCAGGT SA_TENA GAACTACTAGGGGAGCCTAAT----------------GATATGGCTGAGATGAATT-------------------GTTCAGACCCTTA---TGACCTGA-TTTGGTTAG-TACCAACG-TAGGAA-AGTAGTTAT SA_YKOE CACACACTAGGGGTGTTT----------------------TATACTGAGATGAGGCTT---------------GCCCTCAAACCCTTT---GAACCTGA-TCTAGCTTG-AACTAGCG-TAGGAA-AGTGTTACT LLX_YUAJ TTTGCACAATGGGTCTATTGACAAA---------ACTGTCAGTAGCGAGA----------------------------AATACCATC----TGACCTGA-TCTGGGTAA-TGCCAGCG-TAGGAA-TGTGTTAAG CA_THIS ATAGTTAACGGGGAGCCTGTA-----------------GACAGGCTGAGAGTGGAATG--------------TGATTCCAGACCCTCA---TAACCTGA-TTTGGATAA-TGCCAACG-TAGGGA-GTTAATGCA CA_YUAJ TATGTGCTAGGGGTGCCTT---------------------TAGGCTGAGAAACAGTTT--------------GTCACGTTAACCCTT-----AACCTGA-TCTGGATAA-TACCAGCG-TAGGGA-AGCAGTTTG ST_YUAJ TTTCACAAAGGAGTGCTT-----------------------TGGCTGAGATCGCAA------------------TTGCGAAATCCTGA---GGACCTGA-TCTTGTTAG-TACAAGCG-TAGGGA-TTGTGACCA DHA_THIC TAATCACTAGGGGGGCCGAATA---------------AGGTCGGCTGAGATAAAGGACCCA---------AGAATCCTTTGACCCTT-----AACCTGA-TCTGGGTAA-TGCCAGCG-TAGGGAAGGTGGATAA LMO_TENA GAAAAACTAGGGGGGCCGAT-------------------TCTGGCTGAGATAGGAAGGTAAT-----------GCTTTCTGACCCTTT---GAACCTGT-TT--GTTAG-TGCAAGCG-TAGGGA-AGTGAATGT LMO_YUAJ TTACCACAGGGGGGGCTTC---------------------TTAGCTGAGATTGAGTCCACGTGT-----TTTTGGATTCTGACCCTTT---GAACCTGT-TC--GTTAA-TACGAGCG-TAGGGA-TTGTGGCGA PROTEOBACTERIA EC_THIB GTTCTCAACGGGGTGCCACGCGT------------ACGCGTGCGCTGAGAAA---------------------------ATACCCGTCGA---ACCTGA-TCCGGATAA-CGCCGGCG-AAGGGATTTGAGGC EC_THIM AAACGACTCGGGGTGCCCTTCTGC-------------GTGAAGGCTGAGAAA----------------------------TACCCGTATC---ACCTGA-TCTGGATAA-TGCCAGCG-TAGGGA-AGTCACG EC_THIC TTTCTTGTCGGAGTGCCTTA-------------------ACTGGCTGAGACCGTTT------------------ATTCGGGATCCGCGGA---ACCTGA-TCAGGCTAA-TACCTGCG-AAGGGA-ACAAGAG VC_THIC CCACTTGTCGGAGTGCCAT---------------------TGGGCTGAGACCGTTT------------------ATTCGGGATCCGTTGA---ACCTGA-TCAGGTTAA-TACCTGCG-AAGGGA-ACAAGAG VC_THID CCTGTAGTCGGGGAGCCTGAGAG-- 66 5 71 -AATTAAAGGCTGAGATCGCGT-------------------AGCGAGACCCGTTGA---ACCTGA-TTCAGTTAG-GACTGACG-TAGGGA-ACTATCC VC_THIB CCCACTCACGGGGGGCCACCCATTCAT-------CCGAATGGCGCTGAGATCAAGCAC---------------TGCTTGGGACCCGCA 21 -ACCTGA-ACCAGATAA-TGCTGGCG-TAGGAATTGAGCTA XFA_THIC TTTGAAGCGGGGGTACCATAGCCA------------AGCTGCGGTTGAGAC----------------------------ACACCCTTCGA---ACCTGA-TCCGGTTTA-CACCGGCG-TAGGAAAGCTTCGT MLO_THIC CATTCACCAGGGGAGTCCCGG----------------CAAGGGGCTGAGATACTGCTGGCTTTC------GCGGCGCAGTGACCCGTTGA---ACCTGA-TCCAGTTCA-TACTGGCG-TAGGGACGGTGCAA MLO_THIB CGCTCTAACGGGGTGCCGGA------ 5 3 5 -----GACCGGCTGAGAGGCAGT------------------CTCGCCAACCCGCTGA---ACCTGA-TCCGGTTTG-TACCGGCG-GAGGGA-TTAGACG MLO_YK GCCCATCCACAGGGGTGCTCCGTAC-------------GGTCGGGGCTGAGACGGGGGCGG-----------CAAGCCCACAGACCCTAGA----AGCTGA-TCTGGGTAA-TACCAGCG-GAGCGA-GGCGGGCG NX_CITX CTCCTTGTCGGAGTGCCGCCGC---------------CGGGCGGCTGAGATTGCGA------------------AAGCAGAATCCGTAGA---ACCTGT--CGGGGTAA-TGCCTGCG-TAGGAA-ACAAACC NX_THIC ATTGAAACAGGGGTGCTGCCTGAT----------GTTTAGGCGGCTGAGAA----------------------------ATACCCTTTAC---ACCCGA-TCGGGATAA-TACCTGCG-TGGGGA-GTTTTCA ACTINOBACTERIAE MT_THIO CTGTAGACACGGGAGTCCCGGG--------------AGCGGGGTCTGAGAGTGGGCGCGCCT-------------GCCCTTACCGTCAC----ACCTGA-TCCGGATCA-TGCCGGCG-AAGGGAGGTCAAGGATG MT_THIC GTACCCACGCGGGAGCGCACGC--------------CGAGTGCGCTGAGAGGACGGCTCGGG------------GCCGTCGACCGTACGA---ACCTGA--CCGGGTAA-TGCCGGCG-TAGGGAGTTGCAAATG CGL_THIC CAGTCCCCACGGGCGCCCGA-----------------GCACGGGCTGAGATCGCGCTGATT---------GCTGCGCGAGCACCGTTTGA---ACCTG--TCCGGTTAG-CACCGGCG-AAGGAAGAGAGGAATGGTGCAATG CGL_THID ACTAGGCACGGGGTGCCAACCGGATGG---AAAAATTCCGGAGGCTGAGAAA---------------------------ACACCCGTTGA---ACCTGC-TCTAGCTCG-TACTAGCG-AAGGGATGGCCTTAACGTG CGL_THIE CTTACCCCACGGGTGCCCAAT---------------GCATTGGGCTGAGATTGCGCGCTGT---------TGCTGCGCGGGACCGTTCGA---ACCTG--TCTGGTTAA-CACCAGCG-AAGGAAGCGAGGATTGATTGTCCCGTG CGL_YKOE TCATAGACACGGGTGCTCGGTGA------------AAATCCGGGCTGAGATCTGGCA----------------TAGCCACGACCGTCGA----ACCTG-ATCCGGATAA-TGCCGGCG-ATAGGGAGGAAAAATATG CGL_OARX TAGTGACACGGGGTGCAAAAGCACTTT----AAAAAAGCTTTCGCTGAGATT---------------------------ACACCCGTCGA---ACCTG-ATCCAGTTAG-TACTGGCG-AAGGGACTGTCGCAT CYANOBACTERIA NPU_THIC TCCATGCTAGGGGTGCCTACAT---------------AACCAGGCTGAGATC---------------------------ACACCCTTAAC---ACCTGAGTCTGGGTAA-TACCAGCG-GAGGGAAGCTGTTTATTG CY_THIC CCATAGCTAGGGGTGTCTAGAA---------------AGCTAGGCTGAGAA----------------------------AAACCCTTAGA---ACCTGAGACTGGGTAA-TACCAGCG-GAGGGAAGCTCACCATTC AN_THIC TCCATGCTAGGGGTGCTTGCAC---------------TAACAGGCTGAGATT---------------------------ACACCCTTAAC---ACCTGAGACTGGGTAA-TACCAGCG-AAGGGAAGCTGTTTATTG THERMUS/DEINOCOCCUS, THERMOTOGALES, Fusobacterium, CFB group DR_THIB CGCGTCACCGGGGGTGCCCTGCTT------------CGGCAGCGGCTGAGAAC---------------------------ACACCCCAGGA---ACCTGA-ACCGGGTCA-TTCCGGCG-GAGGGAGTGTGATGC DR_THIC ATCGTCAACAGGGGTGCCTCCGCATA--------TGGGCCGGAGGCTGAGAGGGCAACT---------------CGGGCCTAACCCTATGA---ACCTGA-ACTGGTTAG-CACCAGCG-GAGGGA-GTGTGACG TQ_THIBGGCCGTCACCGGGGGTGCCCCA------------------AAAGGGCTGAGAGC---------------------------ATACCCTTGGA---ACCTGA-TCCGGGTCA-TGCCGGCG-TAGGGAAGGTGACGGCC TM_THI1 CCTTCCCCAGGGGGAGCTCCTAT---------------TCCGGGGCTGAGAGGAGGACGG-------------AAGTCCTCGACCCCAAGA---ACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGATCGGGGAAGGA FN_THIC TATATGTACTGGGGAGCTT----------------------TGTGCTGAGATTAGAACCT------------TTTTTCTTAGACCCATAGT---ACCT-GA-TTTGGATAA-TGCCAACG-AAGGGA—GTACCA FN_THIX ACTAGTTACAAGGGAGTTAATA-----------------AATTGACTGAGAAAAGGATG--------------TGAGCCTTGACCTTTTG----ACCT-GA-TTTGGATAA-TGCCAACG-TAGGAA--GTAAA PG_THIS AGACCGCTACGGGGGTGCTTGCCG--- 4 3 4 -GATACGGCAGGCTGAGAT---------------------------AATACCCATAG---ACCT-GA-TCCGGATAA-TACCGGCG-GAGGGAT-GTAG PG_OMR ATTGGGAGAAGGGGTGCTTCCTGTA--- 3 7 3 --GTGGATGGCTGAGAAC---------------------------AAACCCTCATC---ACCT-GA-ACCGGATAA-TACCGGCG-TAGGAAA-CTCTC BX_THIS TAAAGACAAAGGGGTGCCACC------------------CGGTGGCTGAGATT---------------------------ATACCCTAAGA---ACCT-GA-TGCAGTTAG-TACTGCCG-AAGGGA—TTGTG ARCHAEA TAC_T1 GGTGTGGTGGGGGAGCTCCAT-----------------AAGGGGCTGAGAGGATCCGG---------------ATGGATCGATCCCTGGA---ACCTGA-TCCGGGTAA-TACCGGCG-GAGGGAAATTATG FAC_T1 AGTTATACCGGGGAGCTAA---------------------AATGCTGAGAGGATAA-------------------GGATCGACCCGTGCA---ACCTGA-TCCGGACAA-TACCGGCG-GAGGGAGATGGATA

Conserved secondary structure of the THI-element

MG

GG K

CC

C A

G G A

A G

C C U

THI-elem ent

Thi-box

1

4

5

2

C Y G G

G R C C

N U NR

UR

NG

YY

UC

RR

NAG

AG

A

G

3

GA U

GC

N

facultative stem -loop

Capitals: strongly conserved positions. Dashes and points: obligatory and facultative base pairs

Degenerate positions: R = A or G; Y = C or U; K = G or U; M= A or C; N = any nucleotide

THI: the mechanism of regulation

1 ,2

1 ,2

•Thermus/Deinococcus group,•CFB group•Proteobacteria,

• Translation attenuation

•Actinobacteria,•Cyanobacteria,•Archaea

•Bacillus/Clostridium group,•Thermotoga, •Fusobacterium,•Chloroflexus

• Transcription attenuation

Distribution of THI-elements

Genomes Number of analyzed genomes

Number of genomes

with THI

Number of the THI elements

-proteobacteria 7 7 15

-proteobacteria 6 6 12

-proteobacteria 18 17 38

- and proteobacteria 3 1 1

The Bacillus/Clostridium group 18 18 51

Actinomycetes 9 9 25

Cyanobacteria 5 5 5

Other eubacteria 14 11 11

Archaea (Thermoplasma) 17 3 6

Total 97 77 164

Mandal et al., 2003: THI in 3’UTR (plants). THI in untranslated intron (fungi)

Predicted THI-regulated genes: transporters

yuaJ: predicted thiamin transporter (possibly H+-dependent)

• Found only in the Bacillus/Clostridium group;• Occurs in genomes without the thiamin pathway (Streptococci);• Has 6 predicted transmembrane segments (TMS);• Regulated by THI-elements in all cases with only one exception (E. faecalis);• In B. cereus, the thiamin uptake is coupled to proton movement (Arch Microbiol,

1977).

thiX-thiY-thiZ and ykoF-ykoE-ykoD-ykoC: predicted ATP-dependent HMP transporters

• Found in some Proteobacteria and Firmicutes;• Not found in genomes without the thiamin pathway;• Always co-occur with thiD and thiE;• In Pasteurellae, Brucella and some Gram-positive cocci, they are present without

thiC;• Regulated by THI-elements in all cases with only one exception (T. maritima);• Putative substrate-binding protein ThiY is homologous to Thi12 from yeast, known

to be involved in the biosynthesis of HMP

Predicted THI-regulated genes: more transporters

• thiU from P. multocida and H. influenzae belongs to the possible thiMDE-thiU operon, has 12 predicted TMS; similar to proline permease; no orthologs in other genomes

• thiV from Methylobacillus and H. volcanii clustered with thiamin genes or has THI-elements, has 13 predicted TMS , similar to the pantothenate symporter PanF from E.coli; no orthologs in other genomes

• thiW from S. pneumoniae and E. faecalis forms an operon with thiamin genes, has 5 predicted TMS; no homologs in other complete genomes

• pnuT from the CFB group of bacteria forms operon with thiamin-related genes; has 6 TMS; similar to the nicotinamide mononucleotide transporter PnuC from E.coli; no orthologs in other genomes

• cytX from Neiserria and Chloroflexus has 12 TMS, similar to the cytosine permease CodB from E. coli, forms an operon with thiamin genes in Neiserria and Pyrococcus; homologs in other genomes are not regulated by THI-elements.

• thiT1 and thiT2 from three different Thermoplasma (Archaea) are two paralogous genes; have 9 TMS; belong to the MFS family of transporters. This is the first example of THI-element-regulated genes in Archaea

The PnuC family of transporters

The RFN elements

The THI elements

Predicted THI-regulated genes: enzymes

• thiN: non-orthologous displacement of thiESeparate gene in archaea or with thiD (in M. theroautotrophicum)Always present if ThiD is present and ThiE is absent

• tenA: gene of unknown function somehow associated with thiDFound in most firmicutes, some proteobacteria and archaea; ThiD-TenA gene fusions in some eukaryotes;Forms clusters with thiD and other THI-elements-regulated genes in most bacteria;Single tenA gene is also regulated by THI-elements in some bacteria;Not found in genomes without the thiamin pathway;Always co-occurs with the thiD and thiE genes

• tenI: gene of unknown function, thiE paralog Found in some unrelated bacteria;Forms a separate branch in the phylogenetic tree for thiE;In most bacteria, located in clusters of THI-elements-regulated genes.

• ylmB from Bacilli belongs to the ArgE/dapE/ACY1/CPG2/yscS family of metallopeptidases;regulated by the THI-elements in B. subtilis and B. halodurans, not regulated in B. cereus.

• thi-4 from Thermotoga maritima belongs to a family of putative thiamine biosynthetic enzymes from archaea and eukaryotes. Located in the one operon with thiC and thiD.

• oarX from Methylobacillus and Staphylococcus is a single THI-elements-regulated gene; belongs to the short-chain dehydrogenase/reductase (SDR) superfamily

Metabolic reconstruction of the thiamin biosynthesis

= thiN (confirmed)

(Gram-positive bacteria)

(Gram-negative bacteria)

Transport of HMPTransport of HET

THI-elements in delta-proteobacteria: co-operative binding?

• Tandem arrangement of THI-elements upstream of the main thiamine operon thiSGHFE1 in Desulfovibrio spp.

• Tandem arrangement of glycine riboswitches in B. subtilis and V. cholerae (Mandal et al., 2004):– co-operative binding of the cofactor (glycine)– rapid activation/repression– same arrangement in all glycine riboswitches

B12-box and regulation of cobalamin metabolism genes by pyrophosphate (Nou & Kadner, 2000; Ravnum &

Andersson, 2001; Nahvi et al., 2002)

• Long mRNA leader is essential for regulation of btuB by vitamin B12.

• Involvement of highly conserved B12-box rAGYCMGgAgaCCkGCcd in regulation of the cobalamin biosynthetic genes (E. coli, S. typhimurium)

• Post-transcriptional regulation: RBS-sequestering hairpin is essential for regulation of the btuB and cbiA

• Ado-CBL is an effector molecule involved in the regulation of the cobalamin biosynthesis genes

Conserved RNA secondary structure of the regulatory B12-element

A

A

A

AA

AA

CGd

a

aa

a

a

ktk

h

CC

c

C

C

GG

G

GGG

G

GT

M

Y

K

y

c

c G

g

g G

G

G YG

tg

g

g

gN

RN

N

NN

r

r

r

g

g C

c

c T

C

C G

CC

a

ta N

B 12 box

P 0

5' 3'

P 1

P 4 V S

B I IB I

P 5 P 6

P 2

N

A dd- I

F acultative stem- loop

A dd- I I

The group

Bacillus/Clostridium

Other taxonomic groups

-proteobacteria

base stem

CGh

G

d

yc c

C C

P 3

A

A

A

AA

AA

CGd

a

aa

a

a

ktk

h

CC

c

C

C

GG

G

GGG

G

GT

M

Y

K

y

c

c G

g

g G

G

G YG

tg

g

g

gN

RN

N

NN

r

r

r

g

g C

c

c T

C

C G

CC

a

ta N

P 0

P 1

P 4 P 5 P 6

P 2

N

CGh

G

d

yc c

C C

P 3

B12-element

+Ado-CBL

Ado-CBL

pseudoknot

terminator

1 2 3

1 2

antiterminator

3

A

A

A

AA

AA

CGd

a

aa

a

a

ktk

h

CC

c

C

C

GG

G

GGG

G

GT

M

Y

K

y

c

c G

g

g G

G

G YG

tg

g

g

gN

RN

N

NN

r

r

r

g

g C

c

c T

C

C G

CC

a

ta N

P 0

P 1

P 4 P 5 P 6

P 2

N

CGh

G

d

yc c

C C

P 3

B12-element

+Ado-CBL

Ado-CBL

pseudoknot

RBS-sequestorhairpin

1 2

1 2

antisequestor

A. B.

The predicted mechanism of the B12-mediated regulation of cobalamin genes

B12-element regulates cobalamin biosynthetic genes and transporters, cobalt transporters and a number of other cobalamin-related genes.

Distribution of B12-elements in bacterial genomes

Metabolic reconstruction of

cobalamin biosynthesis: new

enzymes and transporters

Cobalt ion transportcbiMNQO, hoxN, hupE, cbtAB, cbtC, cbtD, cbtE, cbtG, cnoABCD

If a bacterial genome contains B12-dependent and B12-independent isoenzymes, the genes encoding the B12-

independent isoenzymes are regulated by B12-elements

Ribonucleotide reductasesRibonucleotide reductases

NrdJ NrdJ ((BB1212-dependent-dependent)

NrdAB/NrdDG NrdAB/NrdDG ((BB1212-independent-independent))

+ ––

–– +

+ +

Methionine synthaseMethionine synthase

MetH MetH ((BB1212-dependent-dependent))

MetEMetE((BB1212-independent-independent))

++ ––

–– ++

++ ++

B12B12 B12

LYS-element: lysine riboswitch

uaAG

u

CG

P 1

5' 3'base stem

R Yr y

Gy

y

r

aa

g

u g

a a a GG

r Cr G

y G Cyk

a G ug R

C a Yu

a

Gg N

a

aA

a N

acUGC

GA

G G gaR

ru

Yy

P 2

P 5P 6

P 7

P 3P 4

Reconstruction of the lysine metabolism

-aspartyl-phosphate

aspartate semialdehyde

homoserine

dihydrodipicolinate

tetrahydrodipicolinate

N-acetyl-2-amino-6-ketopimelateN-succinyl-2-amino-6-ketopimelate

N-acetyl-L,L-diaminopimelateN-succinyl-L,L-diaminopimelate

L,L-diaminopimelate

meso -diaminopimelate

Lysine transport

L-aspartate

lysC,dapG,yclMlysC,thrA,metL

asd

hom

thrA,metL

dapA

dapB

dapDdapD

ykuR

dapC(argD)

ddh

patA

dapE

dapF, dal

lysA

predicted genes are boxed (pathway of acetylated intermediates in B. subtilis)

Regulation of lysine catabolism: the first example of an activating riboswitch

• LYS-elements upstream of pspFkamADEatoDA operon in Thermoanaerobacter tengcongensis; kamADElysE operon in Fusobacterium nucleatum– lysine catablism pathway– LYS element overlaps candidate terminator

=> acts as activator

• similar architecture of activating adenine riboswitch upstream of purine efflux pump ydhL (pbuE) in B. subtilis (Mandal and Breaker, 2004)

S-box (SAM riboswitch)

g u y

c a r

NaAUGc

AP 1

5' 3'base stem

u R

CA

U

U

uGa

P 4

NaGA

g

c

GR

CA

aCcD H

Gg

UGCY

a

AA NuccN

r

N

N

G gy

C cr

P 2

G GG A

C C DC

rG

N y G A a

Ac

gg

P 3

P 5g

Reconstruction of the methionine metabolism

Cystathionine

Homocysteinemethyl-THF

Sulfide

CH

methylene-THF

THF

3

O-acetylhomoserine

Homoserine

Aspartate semialdehyde

Methionine

S-ribosyl-hom ocysteine

(SRH)

S-adenosyl-hom ocysteine

(SAH)

S-adenosyl-methionine

(SAM)

Methylthioribose (MTR)MTA

Threonine

metI yrhB

metC yrhAmetF

yxjH*

metK

mtnKSUVW XYZ

hom

cysH-...metB

metH

metX

metEmtn

mtn

metY

predicted genes are marked by *(transport, salvage cycle)

A new family of amino acid transporters

S-box (rectangle frame)MetJ (circle frame)LYS-element (circles)Tyr-T-box (rectangles)

BC1434

FN 062 4

269.47

SON-3

CJ

CPE

LysT

MetT

TyrT

MleN

DF

CTCCB

OB

SO N-2VC-2

NM B

SON-1

VC-1

BHHP

C

TTE-nhaC

AC0744

FN0978

BL1111

CTC 00901

OB2874OB1118

NMB05 36

FN0352BC4121

EF-nhaC 1

EF-nhaC 2

PPE

LP-nha2

LP-nha1 L

L

M

G A

ELB

BS-yheL

BS-m leN

FN0650

VC2037

BC1709

SA 2292HI1107

VV21061FN207 7

BH3946

BC0373

FN14 22

BB0638

BB0637

F N1420

CTC02529SO1087

VCA0193

BT1270

C

CB

T C02520

CPE2317

FN1414

SA2117

Archaea

clostrid ia

Pasteure llaceae

malate/lactate

Regulation of reverse pathway Met-Cys in Clostridium acetobutylicum

ubiG yrhA

antisense transcript

Cysteine

S-adenosylmethionine

yrhB

AA

Cys-T-box S-box

sense transcript

Three methionine regulatory systems in Gram-positive bacteria: loss of S-box regulons

• S-boxes (riboswitch)– Bacillales– Clostridiales– the Zoo:

• Petrotoga

• actinobacteria (Streptomyces, Thermobifida)

• Chlorobium, Chloroflexus, Cytophaga

• Fusobacterium

• Deinococcus

• proteobacteria (Xanthomonas, Geobacter)

• Met-T-boxes (Met-tRNA-dependent attenuator)– Lactobacillales

• MET-boxes (transcription factor MtaR)– Streptococcales

Lact. Strep. Bac. Clostr.

ZOOMetJ, MetR in proteobacteria

Riboswitches in the Sargasso sea metagenome

• 125 THI-elements

• 38 LYS-elements

• 25 B12-elements

• 9 RFN-elements

• 3 S-boxes

Conserved structures of known riboswitches

NNNNyYYUC

NNNNrRRAG

NgGG

NcCC

Rg

GGxc G

Aux

gRRA

GRC

CYG

AcCG

AGCCRGYGG YRCC GRYBy CYRVr

G N

YGN

aA N U U x N

Nx

AGU

UrN

A gY

uK N

RA

xK

Var

Add

RFN-element

MG

GG

A

G G A

A G

C C U

THI-element

C Y G GN U N

RUR

UC

RR G

A

A

A

AA

AA

CGd

a

aa

a

a

ktk

h

CC

c

C

C

GG

G

GGG

G

GT

M

Y

K

y

c

c G

g

g G

G

G YG

tg

g

g

gN

RN

N

NN

r

r

r

g

g C

c

c T

C

C G

CC

a

ta N

B 12 box

P1

5' 3'

P2

P5 P6 P7

P3

N

base stem

CGh

G

d

yc c

C C

P4

g u y

c a r

NaAUGc

AP1

5' 3'

u R

CA

U

U

uGa

P4

NaGA

g

c

GR

CA

aCcD H

Gg

UGCY

a

AA NuccN

r

N

N

G gy

C cr

P2G GG A

C C DC

rG

N y G A a

Ac

gg

P3

P5g

AUR

UA

P1

5' 3'

C GU R

Y

CA RUAU

GG

P2

AN

U

A

C

GU N U U

A

UA

A A

G

GCC

P3

C

N G A

U

P1

P2

P3

P4

P5

P3 P2

P4

base stem base stem5' 3' 5' 3'

B12-element

base stem

S box-

base stem

G box-

Add

Add I

Add II

Add III

Var

P5

P1

uaAG

u

CG

P1

5' 3'base stem

R Yr y

Gy

y

r

aa

g

u g

aa a GG

r Cr G

y G Cyk

a G ug R

C a Yu

a

Gg N

a

aA

a N

acUGC

GA

G G gaR

r

uYy

P2

P5P6

P7

P3P4

LYS-element

Characterized riboswitches (more are predicted)RFN Riboflavin

biosynthesis and transport

FMN (flavin mononucleo-tide)

Bacillus/Clostridium group, proteobacteria, actinobacteria, other bacteria

THI Biosynthesis and transport of thiamin and related compounds

TPP (hiamin pyrophosphate)

Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, other bacteria, archea (thermoplasmas), plants, fungi

B12 Biosynthesis of cobalamine, transport of cobalt, cobalamin-dependent enzymes

Coenzyme B12 (adenosyl-cobalamin)

Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, spirochaetes, other bacteria

S-box Metabolism of methionine and cystein

SAM (S-adenosyl- methionine)

Bacillus/Clostridium group and some other bacteria

LYS Lysine metabolism lysine Bacillus/Clostridium group, enterobacteria, other bacteria

G-box Metabolism of purines

purines Bacillus/Clostridium group and some other bacteria

glmS Synthesis of glucosamine-6-phosphate

glucosamine-6-phosphate

Bacillus/Clostridium group

gcvT Catabolism of glycine

glycine Bacillus/Clostridium group

Mechanisms

UUUUUUUU

5 ’

33 ’

5 ’

Regulatory hairpin(terminator of transcription and or RBS-sequestor)/

In the case of regulation of transcription

In the case of regulation of translation

GENES

3 ’ GENES

RNA-element

A

5 ’

1 3UUUUUUUU

Antiterm inator/Antisequestor

3 ’ GENES

5 ’ 1 2

RNA-element

3 ’ GENES

B 5 ’

2 3

Antiterminator/Antisequestor

3 ’ GENES

C

5 ’

RNA-element

3 ’ GENES

12

5 ’

1 23 ’ GENES

Regulatory hairpin

+ Effector

UUUUUUUU

- Effector

2

1

gcvT: ribozyme, cleaves its mRNA (the Breaker group)

Properties of riboswitches

• Direct binding of ligands• Same structure – different mechanisms• Distribution in all taxonomic groups

– diverse bacteria– archaea - thermoplasmas– eukaryotes – plants and fungi

• Lineage-specific features…• … horizontal transfer, duplications, lineage-specific loss• Correlation of the mechanism and taxonomy:

– attenuation of transcription (anti-anti-terminator) – Bacillus/Clostridium group

– attenuation of translation (anti-anti-sequestor of translation initiation) – proteobacteria

– attenuation of translation (direct sequestor of translation initiation) – actinobacteria

• Andrei Mironov– software genome analysis, conserved RNA patterns

• Alexei Vitreschak– analysis of RNA structures

• Dmitry Rodionov– metabolic reconstruction

• Support:– Howard Hughes Medical Institute– INTAS– Russian Fund of Basic Research– Russian Academy of Sciences