comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in archaea...
Post on 18-Dec-2015
214 views
TRANSCRIPT
Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea
Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino Moras and Olivier PochLaboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS, INSERM, ULP), BP163, 67404 Illkirch Cedex, France
S18pS11p
S6p
S7p
S9p
S13p
S19p
S3p
S15p
S17p
S20p
S16p
S4pS12p
S2p
S8p
S5pS14p
S10p
Thx
S18pS11p
S6p
S7p
S9p
S13p
S19p
S3p
S15p
S17p
S20p
S16p
S4pS12p
S2p
S8p
S5pS14p
S10p
Thx
A comprehensive investigation of ribosomal genes in complete genomes from 66 different species allows us to address the distribution of r-proteins between and within the three primary domains. 34 r-protein families are represented in all domains but 33 families are specific to Archaea and Eucarya, providing evidence for specialisation at an early stage of evolution between the bacterial lineage and the lineage leading to archaea and eukaryotes. With only one specific r-protein, the archaeal ribosome appears to be a small-scale model of the eukaryotic one in term of protein composition. However, the mechanism of evolution of the protein component of the ribosome appears dramatically different in Archaea. In Bacteria and Eucarya, a restricted number of ribosomal genes can be lost with a bias toward losses in intracellular pathogens. In Archaea, losses implicate 15% of the ribosomal genes revealing an unexpected plasticity of the translation apparatus and the pattern of gene losses indicates a progressive elimination of ribosomal genes in the course of archaeal evolution. This first documented case of reductive evolution at the domain scale provides a new framework for discussing the shape of the universal tree of life and the selective forces directing the evolution of prokaryotes.
B: 23(8;15)
A: 1(0;1)
E: 11(4;7)
BAE: 34(15;19)
AE: 33(13;20)
BA: 0BE: 0
Bacteria: 57 (23;34)
Archaea: 68 (28;40)Eucarya: 78 (32;46)
B: 23(8;15)
A: 1(0;1)
E: 11(4;7)
BAE: 34(15;19)
AE: 33(13;20)
BA: 0BE: 0
Bacteria: 57 (23;34)
Archaea: 68 (28;40)Eucarya: 78 (32;46)
Archaea EucaryaBacteria
34
23
33
111
Archaea EucaryaBacteria
34
23
33
111
An initial set of ribosomal proteins classified into 102 families was obtained at http://www.expasy.ch/cgi-bin/lists?ribosomp.txt. For each family, representatives of various lineages across Bacteria, Archaea and Eucarya were used as probes and systematically compared to a non-redundant protein database consisting of SwissProt, SpTrEMBL and SpTrEMBLNEW using the BlastP program (1) with a cut-off of E<0.001. The results of the BlastP comparison were cross-validated by a TBlastN search against a complete genome database including 66 different species. The putative new gene sequences detected by the TBlastN searches were examined in the light of their genomic context to eliminate false-positives “hits”. For each r-protein family, the likely r-protein sequences obtained by the BlastP and TBlastN searches were included in a multiple alignment constructed by MAFFT (2). All alignments were refined by RASCAL (3) and their quality assessed by NorMD (4). These alignments were manually examined to remove false-positives observed in some ribosomal protein families, in particular those containing ubiquitous RNA-binding domains.
BlastP Hit between RL40_METJA (Query) and RL40_HUMAN
>SW:RL40_HUMAN P14793 60S RIBOSOMAL PROTEIN L40 (CEP52). 10/2001 Length = 52
Score = 31.6 bits (70), Expect = 1.8 Identities = 18/34 (52%), Positives = 20/34 (57%), Gaps = 3/34 (8%)
Query: 13 KKICMRCNARNPWRATKCR--KCGY-KGLRPKAK 43 K IC +C AR RA CR KCG+ LRPK KSbjct: 17 KMICRKCYARLHPRAVNCRKKKCGHTNNLRPKKK 50
Small size and biased composition of r-proteins
Difficulty of protein detection by
similarity search
Genes often missed during annotation
process
A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?
Interdomain distribution
Diplomonads*
Microsporidia
Trichomonads*
Flagellates*
Ciliates*
Plants
Fungi
Animals
Halobacterium
Methanobacterium
Methanococcus
Pyrococcus
Gram positives
Proteobacteria
Cyanobacteria
Chlamydia
Thermotoga
Aeropyrum
Archaeoglobus
Thermoplasma
Aquifex
Deinococcus
Methanopyrus
Pyrobaculum
Sulfolobus
L38e L13e S25e S26e S30e
L14e L34e L30e LXaL35ae
S1p
S21p
L25p
L30p
S22p S21e
L28e
Bacteria Archaea Eucarya
Spirochaetes
Diplomonads*
Microsporidia
Trichomonads*
Flagellates*
Ciliates*
Plants
Fungi
Animals
Halobacterium
Methanobacterium
Methanococcus
Pyrococcus
Gram positives
Proteobacteria
Cyanobacteria
Chlamydia
Thermotoga
Aeropyrum
Archaeoglobus
Thermoplasma
Aquifex
Deinococcus
Methanopyrus
Pyrobaculum
Sulfolobus
L38eL38e L13eL13e S25eS25e S26eS26e S30eS30e
L14eL14e L34eL34e L30eL30e LXaL35aeL35ae
S1pS1p
S21pS21p
L25pL25p
L30pL30p
S22p S21eS21e
L28e
Bacteria Archaea Eucarya
Spirochaetes
Ribosomal protein losses in each of the three domains
Full circles indicate proteins absent in all complete genomes investigated in the
indicated taxon. Empty circles stand for proteins absent in some complete
genomes of the indicated taxon
• Prevalence of r-proteins within the universal pool that may be present in the last universal common ancestor (LUCA)
• specialization of bacterial versus archaeal/eukaryotic ribosomes
• the majority of archeal and eucaryotic r-proteins appears before the split between Archaea and Eucarya, suggesting a complex cenancestor
Reductive evolution as a general trend in Archaea ? In Procaryotes ?
A complex Last Universal Common Ancestor (LUCA) ?
the 30S ribosomal subunit of Thermus thermophilus (5) (back side)
L23p
L13p
L3p
L14p
L29p
L2pL24p
L4p/L4e
L15p
L18pL5p
L6p
L22p
L11p
L34p
L28p
L31pL9p
L19p
L17pL32p
L25p L30p
L33pL21p
L20p
L27pL16p
L36p
L35p
L23p
L13p
L3p
L14p
L29p
L2pL24p
L4p/L4e
L15p
L18pL5p
L6p
L22p
L11p
L34p
L28p
L31pL9p
L19p
L17pL32p
L25p L30p
L33pL21p
L20p
L27pL16p
L36p
L35p
the 50S ribosomal subunit of Deinococcus radodurans (6) (crown view rotated by 180°)
Localisation in the 3D structures « Strip-tease »of the archaeal
ribosome
« Strip-tease »of the archaeal
ribosome
Bacteria-specific proteins (colored in different shades of red) are preferentially located at the periphery of the ribosome
AbstractAbstractAbstractAbstract Ribosomal gene detection : cross-validation Ribosomal gene detection : cross-validation needed !needed !Ribosomal gene detection : cross-validation Ribosomal gene detection : cross-validation needed !needed !
Several representatives f or each protein f amily
BlastP
Proteins
BlastP
Proteins
TBlastN
Completegenomes
Homology DetectionAnalysis)
TBlastN
66 completegenomes
)
Several representatives f or each protein f amily
BlastP
Proteins
BlastP
Proteins
TBlastN
Completegenomes
Homology DetectionAnalysis)
TBlastN
66 completegenomes
)
Protocol of ribosomal gene detection 102 r-protein families
Creation of 24 missed
genes
Complete genomesR-protein families
45
B
acte
ria1
4
Arch
aea
7
Eu
cary
a
100% of the family representatives in both blastp and tblastn>50% of the family representatives in blastp<50% of the family representatives in blastp 0% of the family representatives in both blastp and tblastn 0% of the family representatives in blastp but detected by tblastn
(gene missed during annotation process)
Protein detected by :Validation of protein sequences for each family
Mt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUB
Ap SECY ADK CMK L14EL34EGATAHyp
Ss, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADK
Mk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADK
Py CMK L14EL34E TRUBGATA
CMK L14EL34EPa, Ph, Pf, Mj
Mt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUBMt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUB
Ap SECY ADK CMK L14EL34EGATAHypAp SECY ADK CMK L14EL34EGATAHyp
Ss, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADKSs, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADK
Mk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADKMk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADK
Py CMK L14EL34E TRUBGATAPy CMK L14EL34E TRUBGATA
CMK L14EL34EPa, Ph, Pf, Mj CMK L14EL34EPa, Ph, Pf, Mj
Genomic context analysis
Multiple alignment of complete
sequences
• Coherence of the protein family• Elimination of false-positives• Correction of protein sequences
All the alignments are available at http://www-igbmc.u-strasbg.fr/BioInfo/Rproteins
Progressive elimination of 10 r-proteins (15%) in the course of
archaeal evolution
First example of reductive evolution at domain-scale
E A B
Bacterial rooting
Simple ancestor(s)
A E B
Symbiosis
i o
Simple ancestor(s)
E A B
Eucarya rooting
Complex ancestor(s)
E A B
Bacterial rooting
Simple ancestor(s)
E A B
Bacterial rooting
Simple ancestor(s)
A E B
Symbiosis
i o
Simple ancestor(s)
A E B
Symbiosis
i o
Simple ancestor(s)
E A B
Eucarya rooting
Complex ancestor(s)
E A B
Eucarya rooting
Complex ancestor(s)
Which evolutionary scenario ?
References:1 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402.2 Katoh,K., Misawa,K., Kuma,K. and Miyata,T. (2002) Nucleic Acids Res., 30, 3059-3066. 3 Thompson,J.D., Thierry,J.C., Poch,O. (2003) Bioinformatics, 19, 1155-61. 4 Thompson,J.D., Plewniak,F., Ripp,R., Thierry,J.C. and Poch,O. (2001) J. Mol. Biol., 314, 937-951. 5 Wimberly,B.T., Brodersen,D.E., Clemons,W.M., Jr., Morgan-Warren,R.J., Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000) Nature, 407, 327-339.6 Harms,J., Schluenzen,F., Zarivach,R., Bashan,A., Gat,S., Agmon,I., Bartels,H., Franceschi,F. and Yonath,A. (2001) Cell, 107, 679-688.