evolution of the large-subunit ribosomal rna binding site for protein

11
Evolution of the Large-Subunit Ribosomal RNA Binding Site for Protein L23/25 Anne Chenuil,” Michel Solignac,? and Michot Bernard+ *Laboratoire GCnome et Populations, CNRS, Universite de Montpellier II, France; TLaboratoire Populations, GCnCtique et Evolution CNRS, France; and $Laboratoire de Biologie Moleculaire Eucaryote, CNRS, France The region of the large-subunit rRNA encompassing the D7 divergent domain is organized within eukaryotes in a patchwork of short conservative secondary-structure features interspersed with more rapidly evolving sequences. It contains the attachment site of protein L25 (E. coli L23), which binds rRNA in the first stages of ribosome assembly, suggesting a crucial importance of this region in ribosome elaboration and functioning. A better understanding of its roles requires a good knowledge of its mode of structural variation during the course of evolution. With this aim, we sequenced the D7 region for 24 new invertebrate species belonging to annelids, molluscs, arthropods, and eight other deep-branching invertebrate phyla. Their comparison allowed us to propose refinements in previous eukaryotic folding models. A detailed analysis of the pattern of variation at each position both within the D7 region and along the L23/25 sequence by reference to previous heterologous binding experiments gives new insight into the i-RNA-protein contacts. We identified in the D7 region and L23/25, respectively, six and five positions presenting a pattern of variation compatible with experimental results, three of which show coincident variations which support their possible involvement in the rRNA_L23/25 binding. Introduction The sequences of mature ribosomal RNA mole- cules are organized in a mosdic of regions which tre- mendously differ in their rates of variation. In contrast with small-subunit rRNA, large-subunit r-RNA (LSU rRNA) shows a more complex pattern of sequence vari- ation and a larger increase in size from prokaryotes to higher eukaryotes with dramatic expansions in verte- brates (Clark et al. 1984; Hassouna, Michot, and Bach- ellerie 1984; Gorski, Gonzales, and Schmickel 1987). Nevertheless, at the secondary-structure level, a univer- sal core was identified in all the living species which encompasses not only the best conserved sequences but also several nonconserved sequence regions. This uni- versal core is organized in a dozen domains, the folding of which is common to all the species. The detailed knowledge of the secondary-structure interactions arose essentially from phylogenetic comparisons which con- sist of a systematic search for compensatory substitu- tions in an alignment of homologous sequences. This approach is the most powerful to reveal biologically rel- evant structural features (Woese and Pace 1993; Gutell 1996). The conservative domains (about 2,700 nucleo- tides) are likely to be involved in basic ribosomal func- tions through molecular interactions which should re- main largely homologous between prokaryotes and eu- karyotes. Evidence has now accumulated associating several structural features with several fundamental as- pects of the translational mechanisms. As for the rapidly evolving domains, which accommodate almost all vari- ations in size of this molecule and encompass less than 10% of the r-RNA length in procaryotes but up to 50% in mammals, they pose intriguing questions about their Key words: large-subunit rRNA, secondary structure, evolution, sequence, divergent domain, invertebrates, protein-rRNA interaction. Address for correspondence and reprints: Michot Bernard, La- boratoire de Biologie Moleculaire Eucaryote, CNRS, 118 route de Nar- bonne, 31062 Toulouse cedex, France. E-mail: bmichot@ibcg. bio toul.fr. Mol. Biol. Evol. 14(5):578-588. 1997 0 1997 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 578 organization within the eukaryotic ribosome, their mode of structural variation, and their potential function in the modulation of ribosome biogenesis and/or functioning. Yet, these divergent domains have been only scarcely explored and their secondary-structure folding is far from being definitively elucidated. There are two main reasons for this moderate interest. The first is that the high variability of divergent domains suggests that they could be dispensable and their presence tolerated in to- day’s rRNA only because they do not disrupt the ribo- somal function (Gerbi 1985). The second one is exper- imental: the elucidation of their mode of structural vari- ation and the identification, if any, of diversified struc- tural constraints needs numerous additional sequences in various taxa. We have already identified, using the comparative approach in the three divergent domains D2, D3, and D8 (Michot and Bachellerie 1987; Michot, Qu, and Bachellerie, 1990), in the 3’ end of the LSU r-RNA (Bachellerie and Michot 1989) and in an extra domain of alpha proteobacteria (Otten et al. 1996), subsets of secondary-structure features specific to large phyloge- netic groups and preserved within these groups despite extensive sequence variations. These lineage-specific structural constraints suggest that these regions do have functions which must have significantly diversified dur- ing evolution of the major groups of organisms. Exper- imental support was provided by in vivo analysis of Suc- charomyces cerevisiae and Tetrahymena thermophyla recombinants containing inserted or deleted segments in two divergent domains: A 19-nt insertion in V3 17s of S. cerevisiae prevents the appearance of mature 17s rRNA (Musters et al. 1990), and a 119-nt sequence in- serted in D8 28s of T. thermophylu prevents growth of the transformants (Sweeney and Yao 1989). Remark- ably, the complete deletion of D8 in T. thermophylu can be rescued by its replacement with D8 from other or- ganisms (Sweeney, Chen, and Yao 1994). This evidence for an essential evolutionarily conserved function for one divergent domain containing group-specific struc-

Upload: nguyennguyet

Post on 11-Feb-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Evolution of the Large-Subunit Ribosomal RNA Binding Site for Protein L23/25

Anne Chenuil,” Michel Solignac,? and Michot Bernard+ *Laboratoire GCnome et Populations, CNRS, Universite de Montpellier II, France; TLaboratoire Populations, GCnCtique et Evolution CNRS, France; and $Laboratoire de Biologie Moleculaire Eucaryote, CNRS, France

The region of the large-subunit rRNA encompassing the D7 divergent domain is organized within eukaryotes in a patchwork of short conservative secondary-structure features interspersed with more rapidly evolving sequences. It contains the attachment site of protein L25 (E. coli L23), which binds rRNA in the first stages of ribosome assembly, suggesting a crucial importance of this region in ribosome elaboration and functioning. A better understanding of its roles requires a good knowledge of its mode of structural variation during the course of evolution. With this aim, we sequenced the D7 region for 24 new invertebrate species belonging to annelids, molluscs, arthropods, and eight other deep-branching invertebrate phyla. Their comparison allowed us to propose refinements in previous eukaryotic folding models. A detailed analysis of the pattern of variation at each position both within the D7 region and along the L23/25 sequence by reference to previous heterologous binding experiments gives new insight into the i-RNA-protein contacts. We identified in the D7 region and L23/25, respectively, six and five positions presenting a pattern of variation compatible with experimental results, three of which show coincident variations which support their possible involvement in the rRNA_L23/25 binding.

Introduction

The sequences of mature ribosomal RNA mole- cules are organized in a mosdic of regions which tre- mendously differ in their rates of variation. In contrast with small-subunit rRNA, large-subunit r-RNA (LSU rRNA) shows a more complex pattern of sequence vari- ation and a larger increase in size from prokaryotes to higher eukaryotes with dramatic expansions in verte- brates (Clark et al. 1984; Hassouna, Michot, and Bach- ellerie 1984; Gorski, Gonzales, and Schmickel 1987). Nevertheless, at the secondary-structure level, a univer- sal core was identified in all the living species which encompasses not only the best conserved sequences but also several nonconserved sequence regions. This uni- versal core is organized in a dozen domains, the folding of which is common to all the species. The detailed knowledge of the secondary-structure interactions arose essentially from phylogenetic comparisons which con- sist of a systematic search for compensatory substitu- tions in an alignment of homologous sequences. This approach is the most powerful to reveal biologically rel- evant structural features (Woese and Pace 1993; Gutell 1996). The conservative domains (about 2,700 nucleo- tides) are likely to be involved in basic ribosomal func- tions through molecular interactions which should re- main largely homologous between prokaryotes and eu- karyotes. Evidence has now accumulated associating several structural features with several fundamental as- pects of the translational mechanisms. As for the rapidly evolving domains, which accommodate almost all vari- ations in size of this molecule and encompass less than 10% of the r-RNA length in procaryotes but up to 50% in mammals, they pose intriguing questions about their

Key words: large-subunit rRNA, secondary structure, evolution, sequence, divergent domain, invertebrates, protein-rRNA interaction.

Address for correspondence and reprints: Michot Bernard, La- boratoire de Biologie Moleculaire Eucaryote, CNRS, 118 route de Nar- bonne, 31062 Toulouse cedex, France. E-mail: bmichot@ibcg. bio toul.fr.

Mol. Biol. Evol. 14(5):578-588. 1997 0 1997 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

578

organization within the eukaryotic ribosome, their mode of structural variation, and their potential function in the modulation of ribosome biogenesis and/or functioning. Yet, these divergent domains have been only scarcely explored and their secondary-structure folding is far from being definitively elucidated. There are two main reasons for this moderate interest. The first is that the high variability of divergent domains suggests that they could be dispensable and their presence tolerated in to- day’s rRNA only because they do not disrupt the ribo- somal function (Gerbi 1985). The second one is exper- imental: the elucidation of their mode of structural vari- ation and the identification, if any, of diversified struc- tural constraints needs numerous additional sequences in various taxa.

We have already identified, using the comparative approach in the three divergent domains D2, D3, and D8 (Michot and Bachellerie 1987; Michot, Qu, and Bachellerie, 1990), in the 3’ end of the LSU r-RNA (Bachellerie and Michot 1989) and in an extra domain of alpha proteobacteria (Otten et al. 1996), subsets of secondary-structure features specific to large phyloge- netic groups and preserved within these groups despite extensive sequence variations. These lineage-specific structural constraints suggest that these regions do have functions which must have significantly diversified dur- ing evolution of the major groups of organisms. Exper- imental support was provided by in vivo analysis of Suc- charomyces cerevisiae and Tetrahymena thermophyla recombinants containing inserted or deleted segments in two divergent domains: A 19-nt insertion in V3 17s of S. cerevisiae prevents the appearance of mature 17s rRNA (Musters et al. 1990), and a 119-nt sequence in- serted in D8 28s of T. thermophylu prevents growth of the transformants (Sweeney and Yao 1989). Remark- ably, the complete deletion of D8 in T. thermophylu can be rescued by its replacement with D8 from other or- ganisms (Sweeney, Chen, and Yao 1994). This evidence for an essential evolutionarily conserved function for one divergent domain containing group-specific struc-

Evolution of the LSU rRNA Binding Site for L23/25 579

Table 1 Systematic Positions of the Species Sequenced

Phylum Class Species

Owenia fusiformis Polychaeta Nephtys hombergii

Neanthes diversicolor

Clitellata Hirundo oficinalis Annelida Lombricidae (fam.)

Protodrilus ciliatus Archiannelida Saccocirus major

Parenterodrilus taenioides Nerilla antennata

Protostomian Myzostomida Myzostomum sp.

Coleomates . . . . . Echiuroidea Bonellia viridis

Bivalvia Pecten maximus Nucula nucleus

Mollusca Gastropoda Buccinum undeus Cephalopoda Nautilus macromphalus Aplacophora Aplacophore Myriapoda Cylindroiulus sp.

Arthropoda Arachnida Pholcus phalangioides Antennata Apis melhfera

? Brachiopoda Terebratulina caputserpentis

Deuterostomian Chaetognatha Spadella sp.

Procoelomates. . . Rotifera Notommata copeus

Acanthocephala Acanthosentis tilapiae

Acoelomates . . . . Platyhelmintha Dendrocoelum lacteum

Nom.--Myzostomids are considered either as a phylum or as a class of annelids. The status of Brachiopoda is uncertain (?), so, like Chaetognatha, they are often considered as deuterostomes.

tural constraints suggests that other divergent domains presenting diversified structural constraints may also ac- quire and conserve essential functions during evolution despite their dramatic potential to change in size and sequence. This finding reinforces the interest of increas- ing our knowledge of the mode of structural variation of the rRNA divergent domains for a better understand- ing of their structure-function relationships. This de- tailed knowledge is also essential to use the potential of divergent domains in the reconstruction of short-range phylogenetic relationships (Qu, Nicoloso, and Bachel- lerie 1988; Perasso et al. 1989; Rousset, Ptlandakis, and Solignac 199 1; Pelandakis and Solignac 1993). More- over, it allows the use of group-specific secondary-struc- ture features as phylogenetic signatures.

Using the predictive power of the comparative ap- proach, we focus here on the structural variation of the region encompassing the D7 divergent domain located in the central domain of the LSU x-RNA (domain III), for which we have sequenced 24 new species in 10 deep-branching invertebrate phyla that have been poorly studied, if at all, so far. The phylogenetic range of taxa was chosen to cover the major evolutionary radiations among annelids, molluscs, and arthropods which remain still ambiguous (Philippe, Chenuil, and Adoutte 1994). The structure-function relationships of this D7 region are of particular interest for two reasons. First, this di- vergent domain is a complex split expansion segment organized in several variable subdomains (Michot, Has- souna, and Bachellerie 1984; De Lanversin and Jacq 1989) for which precise secondary-structure folding is

not definitively established. Second, one of these vari- able subdomains which contains hidden breaks in in- sects and in several lower eukaryotes (Gerbi 1996; Gray and Schnare 1996) interrupts universally conserved fea- tures involved in the binding of yeast-ribosomal protein L25 (L23 in E. cob). This ribosomal protein, which is one of the few directly binding the LSU r-RNA in the first stages of ribosome assembly, certainly has essential roles in the following steps (Kooi et al. 1994). In ad- dition, L23 is located within the A-site domain of the peptidyl transferase center on the SOS subunit (Grant et al. 1979) and probably in close proximity with the mRNA-binding site at the 3’ end of the eubacterial 16s x-RNA (Dabbs 1980) suggesting that the D7 region lies in or around domains of rRNA which are essential for the ribosomal function. To shed more light on the com- plex mode of structural variation of this region of the rRNA, we revisited and refined the previous secondary- structure model from our 24 new invertebrate sequences. Then, we identified, owing to their pattern of variation both in the rRNA and in the protein, positions which could be involved in the species specificity of the bind- ing between LSU rRNA and L23/25.

Materials and Methods Biological Sample (Table 1)

Bondia viridis is a mediterranean sea species, pro- vided by the oceanological observatory of Banyuls, CNRS, France. Myzostomum sp. and Acanthosentis ti- Zapiae come from the Atlantic ocean and were provided

580 Chenuil et al.

by Dr. Mattei (Universite de Corte, Corse). The apala- cophoran was provided by Dr. Tillier (Museum National d’Histoire Naturelle, Paris), and Notommata copeus was provided by Dr. Pour-riot (Universite Paris VII). All oth- er species used in this work come from the Manche Sea and were provided by the biological station of Roscoff (Universite Paris VI-France). Archiannelids were collected and identified by Claude Jouin (Universite Paris VI).

DNA Extraction, Amplification, and Sequencing

DNA was extracted from animal tissues following Kocher et al. (1989). Five oligonucleotides were used for double- and single-stranded PCR. c71 (5’- TGGTGGTAGTAGCAAATATT) is located imrnediate- ly downstream of the D7 domain, from positions 2193 to 2212 in the rat sequence according to Chan, Olvera, and Wool (1983); c72 (5’-GTGCAGATCTTGGTGGT- AGT) and c7X (5’-GTGCAGATCTTGGTGGTAGTA- GCAAATA) cover, respectively, rat positions 2 183- 2202 and 2183-2210; c9 (5’-TACTIAAGAGAGTCA- TAG’IT) and c9S (5’-A(G/A)ATGACGAGGCATGCGG CTACCTI’A) were used for the opposite side and cover, respectively, rat positions 3503-3483 and 3522-3497.

Double-stranded PCR amplifications were per- formed in 100 p,l, with 10 ~1 of 10 X buffer (Promega), 1.5 mM MgC&, 200 FM of each neutralized dNTP 30- 100 pmol of each oligonucleotide, 2.5 units of Taq poly- merase, and l-5 ~1 of DNA extract. Asymmetric PCRs were done with one tenth of the purified, supposedly primer-free, double-stranded amplified product with 1.5 pmol of the limiting primer and 30 pmol of the other primer.

When readable sequences could not be obtained di- rectly from the PCR-amplified DNA, we cloned the fragment. PCR products were incubated for 45 min with 10 units of polymerase I and 10 units of polynucleotide kinase in 500 mM Tris-HCl (pH 7.5), 100 mM MgC12, 10 mM dithiothreitol, 200 FM of each dNTP and 0.5 mM rATP Then they were ligated to the plasmid vector pUC 18, previously digested by Sma I, and dephos- phorylated (Pharmacia). Competent E. coli strain DH5o was transformed (Gibco-BRL) and a quick DNA mini- prep procedure was used for sequencing reactions (the T7 sequencing kit of Pharmacia was used for PCR and plasmidic DNA sequencing) with primers c7, ~71, or c7X. Sequences obtained in this work have been as- signed the GenBank accession numbers U52978- U53002.

Sequence Analysis

In the first step, sequences obtained for the D7 re- gion were aligned with the corresponding region of a specialized and structured databank containing sequenc- es of the LSU rRNA conserved core of secondary struc- ture for eukaryotic species. This databank integrates a description of the secondary-structure folding (unpub- lished data). D7 sequences were aligned on the basis of both primary and secondary structure homologies with RNAlign (Corpet and Michot 1994), which also identi- fies the precise junctions between conservative and di- vergent domains. In the second step, the alignment of

the region encompassing the D7 divergent domain was revisited in light of the 24 new sequences determined in the present work, essentially by eye, with the help of a multi-alignment editor (E Corpet, personal communi- cation) derived from Multalin (Corpet 1988), which al- lows a visual identification of compensatory base changes. Secondary-structure interactions which are pre- served within groups of species containing umambi- guous sequence homologies were compared between each group and alignment modified to maximize sec- ondary-structure homologies. Secondary structure draw- ings were produced by RNA-d2 (Perochon-Dorisse et al. 1995) and additional labelings with Designer (Mi- crografx Corporation). Secondary-structure consensus and identification of covariating positions were per- formed with ESSA, a Unix software for analyzing RNA folding (Chetouani et al., personal communication).

Results and Discussion Mode of Structural Variation Over the D7 Region

We have determined 24 new sequences for the D7 region within 10 deep-branching phyla of invertebrates (table 1). For eight of these phyla (i.e., all except Pla- tyhelmintha and Arthropoda), our data represent the first sequences of the central domain for the 28s rDNA. The deepest phylum is represented by Dendrocoelum Zuc- teum (Platyhelmintha), two phyla belong to procoelo- mates whereas other branches belong to coelomates, more precisely to protostomians, with the exception of Terebratulina caputserpentis (Brachiopoda), for which status is unclear, and Spadella sp. (Chaetognatha), which is considered a deuterostomian. Annelids were thor- oughly sampled (eight species from three classes), and all classes of molluscs (except Scaphopoda) were rep- resented. For arthropods, we chose to sequence a myr- iapod and a chelicerate in addition to a new insect Apis meZZzjkru. This enriched phylogenetic spectrum of in- vertebrate sequences, which now contains several spe- cies in three phyla, Annelida, Mollusca, and Arthropoda, provided us the opportunity to examine the details of the constraints exerted on the primary and secondary structures of this region. In particular, it became possible to address the question of the mode of structural varia- tion in these rapidly variable domains among metazoans.

The sequenced region extends between positions 2190 and 2352 of the r-RNA mouse sequence (fig. 1). Its 5’ end lies in the highly conserved single-stranded loop upstream of stem G, which belongs to the universal core of secondary structure. For 16 species sequences extend 3’ downstream of the 6-nt tract which constitutes the 5’ strand of the universal stem M encompassing the two divergent D7 subdomains called D7a and D7b. For the eight other species, we obtained shorter sequences ending between D7a and D7b. By contrast, Nuculu, Hi- rude, and Protodrilus sequences extend further and cov- er the 3’ part of stems M, L, and G (not shown). In the first step, comparisons were performed within metazo- ans for which most of the D7 region can be anambi- guously aligned, all the length variation being restricted to three subdomains called D7a, D7a’, and D7b. In all

Evolution of the LSU rRNA Binding Site for L23/25 581

D7a D7a’ GH I J J Jl Jl 52

2190 ,~+~++e--+f-~

52

XC.

.__.

._A.

v* ._A.

Z-G* __A.

._A*

__A.

.-Vu

:__.

__A.

__“.

._A.

._A.

._A. __G.

__A.

__A.

4-A’ __A.

.__*

J__.

__A’

___.

J-C?

__p

__“.

__G.

__G.

__G.

_C_.

Vert. : I

,

l 0. fusiformis _____ u__ c__ l N. bomber@ _____ __A ___ 0 H. offkinalls _____ ___ CA_

ADD. : kyibrldada&fam)----- --A --- ____o __. . . .

l P:ciliatus _o-_- ___ ___

l P. taeniddes _ __u ___ l S. major _____ __A ___

l N.antennata _ _ _ _ . _ - - - -

:AG” l GCC l CW

P-AC l ___ l ___

J-A_ .“__ t ___

_-AC ‘A__ l ___ _-AC .“__ l ___

J-AA *V-G l --A

--AG ‘V-V l “C-

--AC *V-V l “C- b-AC .“__ . ___

_-AC AA__ . _C.

J-AA *CA= l CG- __A_ ‘“__ . ___

\-AC .A__ l ___

\-AC .“__ . ___

___A .“__ . ___

__A_ .“__ . “_A _-AC .“__ l ___

*

\-A_ .“__ . ___

_-AA .“__ . ___

_-AA .“__ . “A_

---A *V-V V A-G _-AA .__G l AA.

_-AA .__” . ___

\-AC .“__ . ___

_-AC .__A l “_C J-AC *“_A . ___

_-AC l ___ . V-G

__M ..__ . ___

m-AG *VU- l --G

A__C .“__ . ___

_-AA ‘C__ .T

U-AC l C-- l GG*

“GA C ZAG AVG

-A- -A- :A- -A- -A- -A- -A-

-A- -A-

-A- -A-

-A- -A- -A- -A- -A-

-A- -G- -A-

-A-

-A- -A-

-A-

-“-

-A-

-A-

-A-

-A-

-A-

-A- -A-

-A-

__ __A

__ __A

A V- --A v- --c C- GA- NN GA- __ __A

__ c__

G- -A” __ __A

A __ __A

__ __A

-c -AA __ __A

GA --”

G- --A CC C-A

C” CAA

.“V CAA

.C” V-A

__ __A

UC --A

CA --C

GV -cv __ __A

.__ -AC

.__ ___

.__ “C_

_G_ _c_

-GC GA- -

‘CCCA AGG l

._G__ __c .

.____ _AC l

._“-_ --c .

._“__ __c .

*-A-- GAC l

._G_. ___ l

._GN. ___ .

.____ __c .

*“-A” l -V A

-GG-C -GG-- --C-V

-GG-C -GG-V

-“-AA -GG--

AGG-C

l GVAV GV- l -CG-V ._“__ C-A . A--N-

._“__ -AC l

.____ -AC l

.____ _Ac_- l

‘“AC- -A” l

.A___ __C .

-GC--

-N-C -GG--

AVG-A -c-c-

._G__ _A” .

.____ “_C .

l G”A- -“- l

.“___ “_” .

*AA”” l W l

“AA”” “AU l

“G---

CGC-A

ACGV- -G--V

C-A-V

-WAC

.____ _AC .

.-GA_ __C .

*A-A- GA” l

l VAVC -UC l

‘_“-- ___ l

‘AW_ __c .

-GG--

“GG-A

AGA-- -G”--

-CGCA

-CGIN

V-AA- G-C l

..A__ GA_ .

CVGAU

CG-VT

CGCGA

XCCCUCGGCCG

..*‘..CC*“TJ

**....A”““”

‘*‘*‘*GCCC”

‘****‘“ACC” . . . . . ..C”G”

. ..‘..‘*.“C

..‘...CG”W

..‘..“...G

l ****ACE”*”

‘**GACNVGUG

l ***GvvvvNv

l ****“C”“C” . . . . ..C”A””

.********“G

. . . . . ..GCC”

. . . . . ..“A”G

.‘....~(JJG

. . . . ..A&-

.**...pJJwG

'AAUAAWWG .*....*.-

.*.***AGC””

*****GGAACG . . . . . . ..“W

.*....*@-J”

l ‘***vcvCCv l *********p,

l “*‘GCCACC . . . . . . ..MC

. . . . . . ..“GC

a Myzostomumsp. -“--- -_- --- a B. viridis I II

_____ __. .-.

i

l P.mazimtn

Mol. l N. nucleus l B. undeus

a Aplaeophore

Cylindroiubn P. phalanglddes A. mellifera B. mori D. melanogaster S. coprophila

T. caputserpentii Spadella D. lacteum C. elegans N. copeus A tilapiae

___ c__

___ ___

__A _._

__A ___

___ ___

__- -_-

___ ___

-_A ___

___ ___

__A ___

__A _..

__A ___

_____ _“___

_“___

_____

____c

_“___

Bra. ’ Cha’. Plat$> 0 Nem. _’ Rot__--- l Aca -*

_____ _____

Eub. E. coli G-M l GG- UN ~“~.~.~~“....~~.~.~~~....~......~.........~~...........~.~.~..~.~......~.......~.......

D7b IHK K L

f-e+ f-- I iiic --. __. __. __. -G-

CG-

Gc-

c--

“C- __.

__.

__.

G-- __.

-c- -c- “-< : -G-

-A. -G-

__.

-G.

-G- G- __.

__.

__.

V-

cc

G-

-

CMA

____

___. _._-

_.__

A___

A___

____

____

____

R ____

____

____ ____

____

____

____

____

____

____

____

____

____ ____

____

____

____

____

_ _“.

4G

-A

:A -A

-A

-A

-A

-A

-A

-A

-v -A

-A

-A -A

-A

-A

-A

-A -A

-A

-A

-A

-A

-A

-A

-P

GP

-P

-A

JCGGC

_____

Z--N_

:_c_ _

;VCV-

_____

-GA-- ‘*WCACG “ACGC*

I

-N--P _____ _____

___A_ __---

__AW__“. _____ cc -_-NC”-C ..--pq”“_

__A_“__“. _____ _c _____CAC . .._____ “GG”~~“..*.....*.‘*‘*.....*.........~~~G~~~~~”~”

__A_“__“. _____ -C -----CAC l **----- VNCCNACVVCGGGVCCACVC __A_“__“. _“___ _c ____C”_C *=-----A ACGMACAWAA’=*~***~‘****~*W~AW~“~”~~~~”~A __A_“__“. _____ -C -A-A-CA- **---V-C VGCC . . . . . . . . . . ..‘....‘............A~~~~A~“C~AC~A

__A__

RG---

__A_“__“. __“__ _C _A___CAC ..A__AG_ CGGVCVCUGCCVAACCVGAC**CVCVGGCCAGGCGGCGAGACCCGVGC -_A-_-_“. ___C” _C _____“GC ..______ C~,$~AC............................~”~~~~”~~~”AA~

__C_.

-AC-- -AC--

---CV--V’ --G-- -C ----CAGC l *---ACC GWVCMUAAVCGWCC*****CVCGVCVWAMMCGAGVGWCGAC

_____

__C__

__A_.

_A___

_____

____.

_--CA__“. __G”_ _C _“_W_A_ l *VAVCCG WVGVVAVVA*‘~“~~~~*~=~~=~~~MVAV~~CVCGVGCVCAVC ---CA--“* --GV- -C -“-W-A- l *“A”ACG UITVG”UUAWAA~***‘*‘******A~“~G~“~WAWA~A~”~A”~

__A_“__“. _____ _c _G___CAC .*_____c ~~~“~~‘.........‘..................W~~~~A”~~A~V

-CA-“--“’ --G-- GC A-A-CVGC **V----C CACGGCAGCCG”C”GCG __A_“__“. __“__ _C W_A_CAC ..______ WGC”““..‘.....................~AWC~”CAACAC~~

__A_“__“. ___“_ _C “CA__“GC .*AA____ “~“~““....‘......................~VV~~A~CA‘.AA~C

_ _A_“_ _“. _____ _C ____“C~ ..______ CGG”G .***~****.‘**...*.**.~***.~.*.W”~~~~A~~A~

__A_“__“. ----- CC -A---“AC *C-V”--- GWCAACAWCCAGAAG”GCC”C*GVCCAWCA”GG”G””G”GCA”AA

__C__ __A____“. __‘&_ _C W-_-VA” ..___WC W~............*...****.*.............*....*.**.

__c__ _-A-A__“. _____ _c _r&__CgJ ..__C_G_ “G . . . . ..t.....t....*..*...*..***.........**..**.

____. __A____“” ___._ m _G___CG” **_-C-G_ “~‘..*.*.*......*....**.........*..***...**.**.*

A-A- -

vll Ml M 2352

E-c : ,

:Gh I __

__ V-

__

__

__

_. __

__

_.

G-

‘V- :“-

V-

“- __

I--

V-

L- -

i- -

i- - -

CGC

-cc --A --A -NA -A-

___

-liA

-NA

___ ___

___ ___

-c- -AG -AG

-A-

-A-

--A -A-

--V

--V -

GACCGAV CCCGGA GAA

n A-A-_-A __A___ __C

M. muscuhu

a 0. fusiformb a N. hombergli 0 8. offiiinatls a Lombricidae (fam] l N. diversicdor a P. ctiiatus a P. taeniotdes l S. major a N. antennata

l Myzostomum sp. l B. vlridis

a P.mazimtu l N. nucleus l B. undeus l N. macromphalus l Aplacophore

l Cylindroiulus l P. phalan@ides a A. melllfera

B. morl D. melanogaster S. coprophila

l T. caputserpentis l Spadella l D. lacteum

C. elegans a N. copeus a A. tilapiae

S. cerevblae A. thallana 0. sativa

E. coli

OVAP

____ _c__

____

_c__

_c__

_c__

____

____

_c__

___.

-CG-

____

_c__

_c__

_c__

____

____

_c__

-CG-

.-A---A --A--- _-c

._“---c -“_-_- -_c I I ._“.__C _“C___ __c

A_G_A_C _“G___ __

““G__ _ _-GA__ __c 4 I ._G”__A __G___ __C

. .A___A _“____ __C

A-A-_-C _“____ __”

A-_--_A _“_--- --c

A_A___A _A____ __c

A-_-_-G _“____ “_C

A-AGVGG --V-A- __A___C _A”AA_ ___

--A-_-C -A”A_- ___

A-A---A -“---- --c

A-C_“__ _“_A_

AW---A -W--V V-G ._A___A ______ __C

V-A-A-A ___A__ A_”

A__V__A VGV___ __c

“_GG__G V_____ __c

V_aA_G V_____ __C

FIG. l.-Alignment of the eukaryotic sequences over the D7 domain. Sequences determined in this work are identified by a dot. Only representatives of other major eukaryotic groups are included in this alignment. The mouse sequence (top line) serves as a reference, with nucleotide positions numbered from the 5’ end of the molecule. In other sequences, only substituted nucleotides are shown. Identities are denoted by hyphens, and deletions are denoted by asterisks. Base-paired nucleotides are boxed (thin line), whereas stems are delineated by opposite arrows and identified by a capital letter (G to M) according to De Lanversin and Jacq (1989). Sites of length variations are shown in thick boxes and identified D7a, D7a’, and D7b.

- -. -~ -- _- - __ - ----. --- -_

582 Chenuil et al.

species, the conservative regions, despite numerous changes in primary structure, fold in exactly the same secondary structure shape. Stems G, H, I, J, K, L, and M were previously identified as structural components of the conserved core, but, as revealed by variations in the relative size of stems and loops of the different mod- els in the 23S-like compilation, the folding for this re- gion was uncertain. Our new sequence data bring ad- ditional comparative evidence, allowing one to test ev- ery putative single base pair in each stem. Therefore, we have precisely identified those sequences which were actually conserved during the course of evolution. We have also refined the 5’ and 3’ ends of the more variable areas D7a, D7a’, and D7b owing to the identification within invertebrates of three stems, called Jl, 52, and Ml, which were already described in a few species (De Lanversin and Jacq 1989) but not proposed in the most recent 23S-like compilation (Schnare et al. 1996). Our work reveals that these stems are conserved over large phylogenetic distances through numerous compensatory base changes and, thus, are intrinsic features of the con- served core of secondary structure within metazoans. Stem Ml lies exactly seven nucleotides upstream of the universal stem M and is always 3 bp long. In contrast, stems Jl and 52 may tolerate little structural variation. Stem Jl, which usually consists of 3 bp, contains only two bp in archiannelids, in N. diversicolor, and N. ma- cromphalus. Its hairpin loop is 4 nt long in 28 species out of 32 and is a YNRA motif in 24 species. Imme- diately downstream, the 5-bp stem 52, which closes the D7a variable subdomain, may exhibit one mispairing in several species. When the comparison is extended to the whole eukaryotic kingdom, stems 52 and Ml can still be proposed in yeasts and plants. By contrast, stem Jl, which is also conserved among yeasts, cannot be pro- posed either in monocotyledons or in dicotyledons. In eubacteria, the region encompassing the D7 divergent domain is shorter and largely differs in sequence. Nev- ertheless, at the secondary structure level, the counter- part of eucaryotic stems G, H, I, J, K, L, and M can be identified. By contrast, eucaryotic domains Jl to D7a’ do not have any equivalent in eubacteria in which this region is only 3 nt long. Stems Ml and D7b are also absent in eubacteria, where they are replaced by a seg- ment which generally varies from 4 to 10 nt but can be sporadically expanded to about 100 nt and then folds in a giant unbranched hairpin stem.

The conserved areas of the D7 region within met- azoans reveal a compact secondary structure folding from stem G to stem L organized in two four-branched internal loops (fig. 2). One contains stems I to 52, where- as the other is made up of stems G, H, K, and L. The thermodynamic stability of this D7 region, conferred by the high content in hydrogen-bonded nucleotides, is re- inforced by the GC content of stem 52 and the presence of three tetraloop stems, J, Jl, and Ml. In particular, the G(GNRA)C motif in the apical part of stem Ml corre- sponds to one of the most common tetraloops in rRNA and is a particularly stable structural feature owing to base stacking and non-Watson-Crick hydrogen bonds between nucleotides of the loop (Heus and Pardi 199 1;

FIG. 2.-Conserved secondary structures in the eucaryotic D7 do- main. A consensus of secondary structure was derived from a universal eukaryotic alignment for the region located between stems E and D. Surrounding sequences, the folding of which is well known (Schnare et al. 1996) are replaced by a thin line. Nucleotides conserved in 90% of the species compared are given, whereas more variable positions are replaced by dots. Phylogenetically supported base pairings are identified by thick bars. Thin bars denote pairings which are generally possible but with several exceptions. Variable subdomains are depicted by thick boxes with indication of their size variation. Dotted boxes delineate areas which are absent in eubacteria as compared to eukary- otes.

Varani, Cheong, and Tinoco 1991). This stem, highly constrained in eukaryotes, is absent in the eubacterial kingdom, whereas in archaebacteria, stem Ml is pre- served within thermophyles and methanogens but absent in all halobacteria sequenced so far. Stem J is also dif- ferentialy constrained within several groups of eukary- otes. In insects, this stem is reduced to 2 bp through r&pairing in the basal pairing, and in the Rotifera, the Acanthocephala, and the Myzostomida, it is reduced by r&pairing at the top of the stem. Interestingly, in each of these two situations, several combinations of mis- paired nucleotides are observed, suggesting that a con- straint is exerted against the presence of a base pair at these positions. The most conserved nucleotides within metazoans, which are dispersed along the sequence, are in fact clustered in the central four-branched internal loop between stems G-H-K-L, in the internal loop which separates stem H and I, and in the apical parts of stems K and Ml.

In eukaryotes, the divergent domain D7a’, which links stems 52 and I, is always a short (l-l 1 nt), single- stranded motif. In contrast, D7a and D7b are longer and show a more spectacular range of length variation (from 2 to 44 nt and from 2 to 48 nt, respectively) associated with a very high divergence in sequence. Accordingly, it is not possible to detect any homology between dif- ferent phyla. The situation is quite different when com- parisons are performed between more closely related species. Respectively, within Polychaeta, Clitellata, Bi- valvia, and Gastropoda, significant patterns of sequence conservation are detected, allowing the use of the com- parative approach to identify base pairings potentially supported by compensatory changes. Then, we system- atically performed pairwise comparisons of the detailed

a

b

-C-A* U-A

A G A-U

U A/::

- EIG" - 8rE G-C

GC-G AU.A A uG-C G C-G C A U/C

- E---- u-o - G-C x2 8-S %

I---, ---I r--,

;8nt I

-& ;18nt I

---I i 12nt I I12nt I

-ulg’ -xg ’ I -u& ’

G” A C

GE U ; u Gz A C-0 G-C "A13 E

C-0 G-C

U-A :: c

U-G :: u

U-A :: u

U-A t u

-C-G-C-G- 'LOP C-G- ::: 83

G-C 8::

8-G" 8-G" 8% GG Es

r--,

’ “-5 , I * \ -JI ---

.-. . . .

. . . . .

2: -9-a -

. . .

EH 0’.

r--, I S”-.. ,

I!. j, --- t . . .G . . .

.C_G R.Y U.R :: y

- 0°F 8% GO

N. nucleus P. maximus B. undeus T. caputserpentis

\ Y

J Gastropoda Brachiopoda

Bivalvia

FIG. 3.-Group-specific constraints on the folding of D7a. D7a is boxed (full line). Regions which cannot be folded in a same secondary structure within each group (boxed by dotted lines) are not shown, and their size in nucleotide number is given. The consensus structure for each group of species is represented using dots, hyphens, and letters with the same meanings as in figure 2. a, Clitellata and Polychaeta. b, Bivalvia, Gastropoda, and Brachiopoda.

foldings of D7a and D7b among all sequenced inverte- brate organisms. D7b folds in an irregular unbranched stem in all the species (not shown), but, as revealed by the relative position of bulged and looped nucleotides, which differs in each species, there is no constraint on the details of its folding even between species of the same class. Within D7a, in contrast, we have identified structural features which are differentially constrained in two groups of invertebrate organisms: Polychaeta and Clitellata on one hand (fig. 3~) and Bivalvia, Gastropoda and Brachiopoda on the other hand (fig. 3b). Both form an unbranched stem which differs in the relative posi- tions of unpaired nucleotides. Thus, stem 52 is followed by a 3-nt internal loop in Bivalvia, Gastropoda, and Brachiopoda, whereas it is followed in Polychaeta and Clitellata by two bulged nucleotides separated by a sin- gle base pair which is strongly supported by the pres- ence of four compensatory changes (G:C, U:A, A:U, and G:U) in the five species sequenced. There is also phylogenetic evidence on several other interactions, re- spectively, within each of these two groups as revealed by the presence of compensatory substitutions. The comparison between these group-specific structural con- straints which are exerted on the D7a variation pose in- triguing questions regarding phylogenetic relationships deduced from morphological (Brusca and Brusca 1990;

Evolution of the LSU rRNA Binding Site for L23/25 583

Eernisse, Albert, and Anderson 1992) or molecular data (Field et al. 1988; Lake 1990). The folding pattern among Archiannelida excludes them from Polychaeta, whereas Myzostomida does not present the D7a folding pattern shared by two Annelida classes (Polychaeta and Clitellata), and Brachiopoda, represented by T. cuput- serpentis, folds as Bivalvia and Gastropoda.

Recognition Site for L23/25

The x-RNA-binding site is entirely contained in two interacting fragments identified by nuclease protection experiments (Vester and Garrett 1984), the 5’ one being interrupted by the variable domain D7a in eukaryotes (fig. 4). In a set of heterologous binding experiments, E. coli L23 and the homologous S. cerevisiae L25 bind equally. E. coli and yeast rRNA and confer identical protections from RNase (El-Baradi et al. 1985, 1987). Surprisingly, in the protist T. thermophyla, in which three nucleotides are processed from the tip of D7a, yeast L25 recognizes mature 26s rRNA but fails to bind a synthetic precursor rRNA fragment containing the equivalent of the binding site (RauC et al. 1990). There- fore, unprocessed D7a prevents L23 binding, suggesting a role for this divergent domain in the rRNA-protein recognition. The difference in the structural constraints observed among invertebrate groups suggests that D7a could mediate, in a species-specific manner, the r-RNA recognition by protein either through direct contacts with amino acids or by conferring a specific 3D con- formation to the r-RNA binding site. The high thermo- dynamic stability in all eukaryotes of stem 52, which closes D7a, could be essential for the spatial assembly of the protein-binding site in this major kingdom. Pro- cessing events within D7a also occur in insects. In Sci- uru coprophilu (Ware, Renkawitz, and Gerbi 1985) and Bombyx mori (Fujiwara and Ishikawa 1986), a segment of 19 and 39 nt which corresponds precisely to D7a is respectively excised. In contrast, in Drosophila melu- noguster (Ware, Renkawitz, and Gerbi 1985), 23 nucle- otides are removed which cover not only D7a, but also stem 52 and a part of Jl, giving to this insect a binding site organization more closely related to eubacteria than to eukaryotes. This dramatic modification in the rRNA- binding partner should correspond to adaptations in the protein component. The determination of ribosomal pro- tein L23/25 sequence for insects would be particularly interesting in view of better defining D7a structure-bind- ing relationships.

Other heterologous binding experiments demon- strate the inability of mouse 28s rRNA to bind either the E. coli L23 or yeast L25 proteins (El-Baradi et al. 1987). Moreover, the deletion in yeast of the D7a sub- domain, which is the main difference between eukary- otes and prokaryotes, or its replacement by its mouse counterpart does not perturb fixation of either prokar- yotic or yeast ribosomal protein (Musters et al. 1991). These experiments indicate that D7a is dispensable for the binding and, in addition, that its expansion in mouse rRNA is not by itself responsible of the lack of binding of L23 and L25. Therefore, there must be, within the segments protected by L25 which are remarkably con-

6 C (v) / A (SC, ec)

\ ‘a. -GG l - -.. I tiiitii

584 Chenuil et al.

I UW/A 10

A u A .‘@

2 IA (v)/ G(sc ec)

I ,

ti .I. l

C-G .-.

.I.

(2144) 5’ / ’ 3’ (2577)

FIG. 4.-rRNA nucleotides which differ between mouse on one hand, S. cerevisiae, and E. coli on the other hand. The rRNA of 63 species representative of the phyla in the three kingdoms (19 eucaryotes, 35 eubacteria, and 9 archaebacteria) were compared, and a consensus of secondary structure was drawn. Only nucleotides which are present at least in 90% of these species are shown. The 5’ and 3’ ends of the two fragments involved in the ribosomal protein L23/25 recognition are located by large filled triangles. Positions protected from nucleases and chemical probes by L23/25 are circled. Empty circles identify weakly protected nucleotides, whereas shaded circles show strong protection. Full squares denote protected nucleotides identified by Vester and Garrett (1984) and filled triangles denote those identified by Egebjerg, Christiansen, and Garrett (1991). Variable positions are indicated by dots, and compensatory base changes are indicated by thick bars; regions which cannot be folded in a same secondary structure are replaced by dotted lines. Numeration corresponds to the mouse sequence. Positions which are common between S. cerevisiae and E. coli but different in mouse are shown by an arrow and identified by a number. Their patterns of variation are described in adjacent boxes. Species or groups of species are shown in brackets using the following abbreviations: m = mouse; v = vertebrates; SC = S. cerevisiae; ec = E. coli; pr = prokaryotes.

served both in sequence and secondary structure among eukaryotes and prokaryotes, tenuous differences in the structural features directly involved in the specificity of the binding. They could correspond to changes in a sub- set of contacts between LSU x-RNA and L23/25 which might have diverged during evolution in the branch leading to mammals after the emergence of yeasts. The putative LSU rRNA contacts with both proteins L23 and L25 have been identified in E. coli and S. cerevisiae by chemical modification protection experiments (Vester and Garrett 1984) and recently revisited by chemical and ribonuclease footprinting methods using a primer exten- sion approach (Egebjerg, Christiansen, and Garrett 1991). Several differences in the results from these two in vitro approaches point to the difficulty in experimen- tally identifying all contacts between the two interacting molecules. Therefore, despite the fact that 20 nucleo- tides were thus proposed to directly bind the protein, either weakly (14 nt) or strongly (6 nt), we cannot ex- clude the possibility that a few others remain to be iden- tified.

With the double aim of detecting among the pre- viously identified contact sites those likely to explain results of the heterologous binding experiments, but also of finding possible new contacts, we performed a sys-

tematic search for nucleotides common to yeast and E. coli but different from mouse. We found 10 nt present- ing this particular pattern of variation out of the 139 nt of the two protected fragments (fig. 4). The same anal- ysis, performed on the other regions of the conserved core of secondary structure (2,090 positions tested), re- veals an excess of occurrences of a 2.8 factor in the two segments involved in the contact with L23/25. In addi- tion, 8 of the 10 occurrences concern unpaired nucleo- tides, compared to only 25% in the other regions of the conserved core. Taken together, these two observations suggest that the presence of nucleotides showing the pat- tern of variation searched within the two protected frag- ments is certainly not due to chance. The most striking feature is the location of 8 of these 10 nucleotides pre- cisely within the two short regions of the 5’ protected fragment, which contains all the contact sites already identified with the protein. Thus, stems I, J, L, and M, which do not contain protected nucleotides, do not pres- ent any position with the searched pattern of variation. Moreover, among these 10 occurrences, nucleotides lo- cated at sites 2, 3, 5, and 6 correspond exactly to four nucleotides already suggested as protein contact points. The direct involvement in L23/25 binding of sites 2 and 6, which was suggested by only one experimental ap-

Evolution of the LSU rRNA Binding Site for L23/25 585

l R.norvegicus 69 KLDHY 0 S.cerevisiae 55 R--S-

H.jadinii 55 R--N-

O E.coli 5 EERLL l Y.pseudomon 3 _____ 0 B.stearothermo. MKDPR

M.capricolum MHIT

0 M.vannielii MDAF 0 H.marismortui MSW

0 CG.sativa 0 ~~.tabacum 0 CM.polymorpha

b

118 119 128

1 L )11_1111.

KV __ DI E-

DV __

LRAPHVSEKASTAMEKSNTIVLKVAKDATKAEIKAAVQKLF __S____-_-_A-___N_-___-__-_______________ IKR-IIT-NRINLIQQ*KKYTFE-DVK-N-T-V-D--E-I- -KK-VLT--SFAGHKD*-VYTFL-D-K-N-VQ--KTFEEI-

RPDGE'*+**********;KKAYVRL:APDYDALDVANKIGII _-N-~************;_______ iTA__ ____I__R__Y_

--N-~************:_______:TA_H____I_____Y_ : :

: : VKGKVKRH~QRIGRRSDW,KKAWT~I~QQNLDFVWAE ____S______V______ ;_______;______ __I___ Y---F--V-RYS-YTNRR;---I---:TPDSKEIELFEV Y-DA-E--L-KW-KKPSY;---II--:-----L-VLSDL

: :

l'PKQQ*************:KKAYIKL:KDEYNAGEVAASLQIY -"D-R*************;---~-:S~DD-Q---SR~-VF

I :

: :

LPGKGRRMQPILQHTMHY:RRMIITL;QPGYSIPLLDREKN

iWYA

Site 1 G

Sites 3 - 9 A-U

rpL23-25 Species

Pos 76 Nb Ile 2 Rn; Bs

1 CMP

Pos 75

LYs 2 SC; EC

Sites 43 7 8 10 Pos 119

GCUU Ile 1 Rn

A U C A ~ E : Sc;Ec;Bs;Hm; ChI

FIG. 5.-Alignment of L23/25 ribosomal protein sequences and coincident variations with rRNA. a, Only the domain which can be aligned between all sequences is presented. When necessary, the number of amino acids not shown is indicated on the left. Species for which LSU rRNA is available are identified by a dot. T. brucei was not included in this analysis because, for this species, the phylogenetic tree of the L23/25 binding domain is not concordant with the phylogenetic tree of the LSU-rRNA (Metzenberg et al. 1993). The maize sequence, which presents only one difference with 0. sativa on the penultimate amino acid (K and T, respectively, in 0. sativa and Z. maize), is not shown. Despite the fact that the sequences were aligned between all the species, a species was retained as a reference, respectively, within eukaryotes, eubacteria, archaebacteria, and chloroplasts. In each group, hyphens indicate identities with the sequence reference. The RNA-binding domain and a motif involved in the binding (Rutgers et al. 1991) are respectively underlined and boxed (thick dotted line). Thin boxes and arrows indicate positions which present the pattern of evolution searched (see text). These positions are numbered according to the rat sequence. b, Positions in the rRNA (numbered as in fig. 4) and L23/25 showing coincident variations are detailed. The number of species exhibiting the same pattern of variation in the rRNA (left part) and the L23/25 (right part) is indicated, and species are identified by their initials. Rn: R. norvegicus; SC: S. cerevisiue; EC: E. coli; Bs: B. steurothermophilus; arch: the two archaebacteria M. vunnielii (Mv) and H. murismortui (Hm); Chl: chloroplastic sequences-Z. mays (Czm), 0. sutiva (Cos), N. fubucum (Cnt), and M. polymorphu (Cmp).

preach, is also supported by their particular pattern of experiments. Interestingly, three of these nucleotides variation. Among the 16 other nucleotides protected (sites 1, 7 and 8) are immediately adjacent to three con- from nucleases by the protein, 8 are strictly conserved tact sites, respectively, at positions - 1, +2, and + 1. in all the living species, whereas the 8 others present a Among the three other occurrences, two (sites 9 and 10) complex pattern of variation. The 6 other nucleotides lie in the short 3’ protected fragment, which does not identified by our evolutive approach (sites 1, 4, 7, 8, 9, contain any nucleotide protected by L23/25, suggesting and 10) were not ascribed as contact sites by protection that this fragment could also interact with the protein.

586 Chenuil et al.

Thus, we detected six new possible contacts by an evo- lutionary approach based on the compatibility of the nu- cleotide pattern of variation with heterologous binding experiment results. These six positions could be in- volved in weak contacts with the amino acids of L23/ 25, making them experimentally difficult to detect.

If these nucleotides, or a subset of them, are di- rectly involved in binding with the protein, then we should observe parallel patterns of evolution in L23/25 amino acids. An attempt to identify coincident variation was previously performed using a computer program constructed to find similarity in the patterns of substi- tution between species (Metzenberg et al. 1993). Several coincident variations were detected, but none of them are able to explain heterologous binding results. We fol- lowed in this work an evolutive approach which intro- duces results from experimental binding experiments, making it better focused on the rRNA-L23/25 interac- tion. It allowed us to identify new coincident variation between rRNA and L23/25, which could be responsible for the species binding specificity. The high rate of sim- ilarity in the 80 3’ terminal amino acids between rat and a protist (Metzenberg et al. 1993), which diverged more than 700 MYA, suggests that the sequences of mouse and rat (which diverged less than 10 MYA) are certainly identical. In addition, rat and mouse rRNAs are identical in the two segments protected by the protein binding. Then it is interesting to search for amino acid positions common in E. coli and yeast but different in rat, within the best-conserved segment of the protein which corre- sponds to the minimal subset of L23/25 sequence re- sponsible for rRNA binding (Rutgers et al. 1991). We identified five positions (amino acids 75, 76, 118, 119, and 128 in rat sequence) fitting this criterion (fig. 5~). Then, we compared the detailed pattern of variation at each of these amino acid positions with the pattern of variation at each rRNA site identified in the D7 region fitting the same criterion (fig. 4) in the 10 species for which both the protein and the rRNA sequences were available and present a similar rate of variation consis- tent with coevolution between the two interacting part- ners. This examination reveals several possible coinci- dent variations (fig. 5b). The substitution in mammals at each of the five rRNA sites, 4, 5, 7, 8, and 10, is paralleled by a replacement in position 119 of L23/25. Nevertheless, these potential coincident variations cor- respond to a single evolutionary change and may there- fore arise by chance. More interestingly, x-RNA site 1 is a cytosine in all species except in rat and one eubacteria (B. stearothermophilus), which are the only two species having a guanine. Remarkably, these two species are also the only two which have an isoleucine in position 76 of L23/25, whereas other ones have a valine (four species) or a glycine (three species) or, exceptionally, a glutamic acid (M. polymorpha chloroplast). Another striking observation concerns rRNA sites 3 and 9, which are paired and thus present a parallel evolution reflecting the presence of compensatory substitutions. The G:C pairing is correlated with an aspartic acid at position 75 of L23/25 in six of eight species, whereas the A:U pair- ing which is found in S. cerevisiae and E. coli is cor-

related with a lysine. By contrast with L23/25 position 76, this amino acid substitution corresponds to a non- conservative amino acid change. It should also be noted that, in the two coincident variations involving rRNA site 1 and sites 3-9, at least two independent mutational events must be invoked during evolution to explain the two couples G(rRNA site 1)-Isoleucine(76) and A: U(rRNA site3-9)-Lysine(75). It would be surprising if these two pairs of changes had occurred independently by chance twice during evolution.

Four of the six nucleotides strongly protected from nucleases or chemical reagents by L23/25 are highly conserved and could correspond to essential binding sites. One is highly variable and differs between mouse, yeast, and E. coli (fig. 4), but its pattern of variation does not appear to be compatible with the observed het- erologous binding results. In contrast, site 3, which is base-paired with site 9, has a pattern of variation com- patible with experimental results and, in addition, coe- volves with position 75 of L23/25, making this base pairing a candidate in the specificity of the binding. As for the weakly protected sites, we can imagine that, taken alone, each of these nucleotides is not essential for the binding. By contrast, taken together, they might play important roles both in the efficiency and the spec- ificity of the binding. The potential coevolving x-RNA site 1 and L23/25 position 76, whereas not detected ex- perimentally, could also be a new weak interaction. Thus, this multiplicity of contacts, and more particularly of weak contacts, would suggest that each of the inter- acting positions are probably weakly constrained and may evolve rapidly on both molecules leading to a prob- able relatively easy coevolution between nucleic acids and amino acids. The only condition for each substitu- tion is that the 3D conformation of each molecule re- main compatible for an efficient interaction. This could explain the presence of exceptions in each coevolution. This suggest also, surprisingly, the main importance of a restricted set of contacts in the specificity of the bind- ing, since only a very few of them would seem to be sufficient to explain why yeast and E. coli L23/25 failed to bind mouse r-RNA in heterologous binding experi- ments. New in vitro binding assays, built on the basis of these evolutive results, are required to test the effec- tive role of potential interacting positions between this crucial region of the rRNA and L23/25. Such analysis would be performed in parallel with the elaboration of 3D models. Our improvement of the alignment within the LSU rRNA D7 region now allows the use of the comparative approach to search for the presence of ter- tiary interactions.

Acknowledgments

We are grateful to Dr. Jean Pierre Bachellerie for his constant interest and support and Dr. Monique Erard for helpful comments on the manuscript. Pr. Andre Adoutte first proposed the idea of studying the phylog- eny of protostome in invertebrates and facilitated the obtention of biological samples with the help of Guil- laume Lecointre. Mrs. Join and Drs. Tillier, Pour-riot,

Evolution of the LSU rRNA Binding Site for L23/25 587

Dauvin, and Gentil are thanked for their generous gift of identified material and Dominique Vautrin is thanked for laboratory technical assistance. This work was fi- nancially supported in part by grants from the Groupe- ment d’IntCret Public, Groupement de Recherches et d’Etude sur les GCnomes (GIP GREG) to B.M. and from M.R.T. (French Ministry of Research and Technology) to A.C.

LITERATURE CITED

BACHELLERIE, J. I?, and B. MICHOT. 1989. Evolution of large subunit rRNA structure. The 3’ terminal domain contains elements of secondary structure specific to major phyloge- netic groups. Biochimie 71:701-709.

BRUSCA, R. C., and G. J. BRUSCA. 1990. Invertebrates. Sinauer, Sunderland, Mass.

CHAN, Y. L., J. OLVERA, and I. G. WOOL. 1983. The structure of rat 28s ribonucleic acids inferred from the sequence of nucleotides in a gene. Nucleic Acids Res. 11:7819-7831.

CLARK, C. G., B. W. TAGUE, V. C. WARE, and S. A. GERBI. 1984. Xenopus laevis ribosomal RNA: a secondary struc- ture model and its evolutionary and functional implications. Nucleic Acids Res. 12:6197-6220.

CORPET, E 1989. Multiple sequence alignment with hierarchi- cal clustering. Nucleic Acids Res. 16: 1088 l-10890.

CORPET, E, and B. MICHOT. 1994. RNAlign program: align- ment of RNA sequences using both primary and secondary structures. Comput. Appl. Biosci. 10:389-399.

DABBS, E. R. 1980. On the mechanism of chloramphenicol- induced changes in the photoinduced affinity labeling of Escherichia coli ribosomes by puromycin. Evidence for pu- romycin and chloramphenicol sites. Mol. Gen. Genet. 177: 27 l-276.

DE LANVERSIN, G., and B. JACQ. 1989. Sequence and second- ary structure of the central domain of Drosophila 26s rRNA: a universal model for the central domain of the large rRNA containing the region in which the central break may happen. J. Mol. Evol. 28:403-417.

EERNISSE, D. J., J. S. ALBERT, and E E. ANDERSON. 1992. Annelida and Arthropoda are not sister taxa: a phylogenetic analysis of Spiralian metazoan morphology. Syst. Biol. 4: 305-330.

EGEBJERG, J., A. CHRISTIANSEN, and R. A. GARRETT. 1991. Attachment sites of primary binding proteins Ll , L2 and L23 and 23s ribosomal RNA Escherichia coli. J. Mol. Biol. 222:25 l-264.

EL-BARADI, T. T. A. L., V. C. H. E DE REGT, R. J. PLANTA, K. H. NIERHAUS, and H. A. RAuB. 1987. Interaction of ribo- somal protein L25 from yeast and EL23 from E. coli with yeast 26s and mouse 28s rRNA. Biochimie 69:939-948.

EL-BARADI, T. T. A. L., A. H. RAuB, V. C. H. E DE REGT, E. C. VERBREE, and R. J. PLANTA. 1985. Yeast ribosomal pro- tein L25 binds to an evolutionary conserved site on yeast 26s and E. coli 23s rRNA. EMBO J. 4:2101-2107.

FIELD, K. G., G. J. OLSEN, D. J. LANE, S. J. GIOVANNONI, M. T. GHISELIN, E. C. RAFF, N. R. PACE, and R. A. RAFF. 1988. Molecular phylogeny of the animal kingdom. Science 239: 748-753.

FUJIWAR, H., and H. ISHIKAWA. 1986. Molecular mechanism of introduction of the hidden break into the 28s rRNA of insects: implication based on structural studies. Nucleic Ac- ids Res. 14:6393-6401.

GERBI, S. A. 1985. Evolution of ribosomal DNA. Pp. 419-517 in R. J. MCINTYRE, ed. Molecular evolutionary genetics. Plenum Publishing, New York, N.Y.

-. 1996. Expansion segments: regions of variable size that interrupt the universal core secondary structure of ri- bosomal RNA. Pp. 71-87 in R. A. ZIMMERMAN and A. E. DAHLBERG, eds. Ribosomal RNA. Structure, evolution, pro- cessing and function in protein biosynthesis. CRC Press.

GORSKI, J. L., I. L. GONZALEZ, and R. D. SCHMICKEL. 1987. The secondary structure of human 28s t-RNA: the structure and evolution of a mosaic rRNA gene. J. Mol. Evol. 24: 236-25 1.

GRANT, I? G., W. Z. STRYCHARZ, E. N. JAYNES, and B. S. COOPERMAN. 1979. Antibiotic effects on the photoinduced affinity labeling of Escherichia coli ribosomes by puro- mycin. Biochemistry l&2149-2154.

GRAY, M. W., and M. N. SCHNARE. 1996. Evolution of rRNA gene organization. Pp. 49-69 in ZIMMERMAN and A. E. DAHLBERG, eds. Ribosomal RNA. Structure, evolution, pro- cessing and function in protein biosynthesis. CRC Press.

GUTELL, R. R. 1996. Comparative sequence analysis and the structure of 16s and 23s rRNA. Pp. 111-128 in R. A. ZIM- MERMAN and A. E. DAHLBERG, eds. Ribosomal RNA. Struc- ture, evolution, processing and function in protein biosyn- thesis. CRC Press.

HASSOUNA, N., B. MICHOT, and J.-P BACHELLERIE. 1984. The complete nucleotide sequence of mouse 28s rRNA gene. Implications for the process of size increase of the large subunit rRNA eukaryotes. Nucleic Acids Res. 12:3563- 3583.

HEUS, H. A., and A. PARDI. 1991. Structural features that give rise to the unusual stability for RNA hairpins containing GNRA loops. Science 253: 191.

KOCHER, T. D., W. K. THOMAS, A. MEYER, S. V. EDWARDS, S. P;~;~Bo, E X. VILLABLANCA, and A. C. WILSON. 1989. Dy- namics of the mitochondrial DNA evolution in animals: am- plification and sequencing with conserved primers. Proc. Natl. Acad. Sci. USA 86:6196-6200.

KOOI, E. A., C. A. RUTGERS, M. J. KLEIJMEER, J. VAN’T RIET, J. VENEMA, and H. A. RAuB. 1994. Mutational analysis of the C-terminal region of Succhuromyces cerevisiue riboso- ma1 protein L25 in vitro and in vivo demonstrates the pres- ence of two distinct functional elements. J. Mol. Biol. 240: 243-255.

LAKE, J. A. 1990. Origin of the metazoa. Proc. Natl. Acad. Sci. USA 871763-766.

METZENBERG, S., C. JOBLET, I? VERSPIEREN, and N. AGABIAN. 1993. Ribosomal protein L25 from Trypunosomu brucei: phylogeny and molecular co-evolution of an rRNA-binding protein and its binding site. Nucleic Acids Res. 21:4936- 4940.

MICHOT, B., and J. F! BACHELLERIE. 1987. Comparisons of large subunit rRNAs reveal some eukaryote-specific ele- ments of secondary structure. Biochimie 69: 1 l-23.

MICHOT, B., N. HASSOUNA, and J. I? BACHELLERIE. 1984. Sec- ondary structure of mouse 28s rRNA and general model for the folding of the large rRNA in eukaryotes. Nucleic Acids Res. 12:4259-4279.

MICHOT, B., L. H. Qu, and J. I? BACHELLERIE. 1990. Evolution of large-subunit r-RNA structure. The diversification of di- vergent D3 domain among major phylogenic groups. Eur. J. Biochem. 188:219-252.

MUSTERS, W., K. BOON, C. A. E M. VAN DER SANDE, H. VAN HEERIKHUIZEN, and R. J. PLANTA. 1990. Functional analysis of transcribed spacers of yeast ribosomal DNA. EMBO J. 9:3989-3996.

MUSTERS, W., I? M. GONCALVES, K. BOON, H. A. RAuB, H. VAN HEERIKHUISEN, and R. J. PLANTA. 1991. The conserved GTPase center and variable region V9 from Succhuromyces cerevisiue 26s rRNA can be replaced by their equivalents

588 Chenuil et al.

from other parokaryotes or eukaryotes without detectable loss of ribosomal function. Proc. Natl. Acad. Sci. USA 88: 1469-1473.

O-I-TEN, L., l? DERUFFRAY, l? LAJUDIE, and B. MICHOT. 1996. Sequence and characterisation of a ribosomal RNA operon from Agrobacterium vitis. Mol. Gen. Genet. 251:99-107.

P~LANDAKIS, M., and M. SOLIGNAC. 1993. Molecular phylog- eny of Drosophila based on ribosomal sequences. J. Mol. Evol. 37525-543.

PERASSO, R., A. BAROIN, L. H. Qu, J. I? BACHELLERIE, and A. ADOU-~TE. 1989. Origin of the algae. Nature 339:142-144.

PEROCHON-DORISSE, J., E CHETOUANI, S. AUREL, N. ISCOLO, and B. MICHOT. 1995. RNA_d2: a computer program for editing and display of RNA secondary structures. Comput. Appl. Biosci. 11: 101-109.

PHILIPPE, H., A. CHENUIL, and A. ADOU-E. 1994. Can the Cambrian explosion be inferred through molecular phylog- eny? Development 120: 15-25.

Qu, L. H., M. NICOLOSO, and J. I? BACHELLERIE. 1988. Phy- logenetic calibration of the 5’ terminal domain of large rRNA achieved by determining twenty eucaryotic sequenc- es. J. Mol. Evol. 28: 113-124.

RAuB, H. A., W. MUSTER, C. RUTGERS, J. V. RIET, and R. J. PLANTA. 1990. rRNA: from structure to function. Pp. 217- 235 in W. E. HILL, A. DAHLBERG, R. A. GARRETT, I? B. MOORE, D. SCHLESSINGER, and J. R. WARNER, eds. The ribosome. Structure, function and evolution. American So- ciety for Microbiology, Washington, D.C.

ROUSSET, E M. P~LANDAKIS, and M. SOLIGNAC. 1991. Evo- lution of compensatory substitutions through G.U interme- diate state in Drosophila ribosomal RNA. Proc. Natl. Acad. Sci. USA 88: 10032-10036.

RUTGERS, C. A., M. J. RIENTJES, J. VAN’T RIET, and H. A. RAUI?. 199 1. rRNA binding domain of yeast ribosomal pro- tein L25. Identification of its border and a key leucine res- idue. J. Mol. Biol. 218:375-385.

SCHNARE, M. N., S. H. DAMBERGER, W. GRAY, and R. R. Gu- TELL. 1996. Comprehensive comparison of structural char- acteristics in eukaryotic cytoplasmic large subunit (23 S-like) ribosomal RNA. J. Mol. Biol. 256:701-719.

SWEENEY, R., L. CHEN, and M. C. YAO. 1994. An rRNA vari- able region has an evolutionarily conserved essential role

despite sequence divergence. Mol. Cell. Biol. 14:4203- 4215.

SWEENEY, R., and M. C. YAO. 1989. Identifying functional regions of x-RNA by insertion mutagenesis and complete gene replacement in Tetrahymena thermophyla. EMBO J. 8:933-938.

VARANI, G., C. CHEONG, and I. TINOCO. 1991. Structure of an unusually stable RNA hairpin. Biochemistry 30:3280.

VESTER, B., and R. A. GARRETT. 1984. Structure of a protein L23-RNA complex located at the A-site domain of the ri- bosomal peptidyl transferase centre. J. Mol. Biol. 179:431- 452.

WARE, V. C., R. RENKAWITZ, and S. A. GERBI. 1985. rRNA processing: removal of only nineteen bases at the gap be- tween 28s alpha and 38s beta t-RNA in Sciaru coprophila. Nucleic Acids Res. 13:358 l-3597.

WOESE, C. R., and N. PACE. 1993. Probing RNA structure, function, and history by comparative analysis. Pp. 91-117 in R. E GESTELAND and J. E ATKINS, eds. The RNA world. CSHL Press.

MANOLY GOUY, reviewing editor

Accepted February 6, 1997

Note added in proof.

While our manuscript was processed for its publi- cation, new results about the binding of protein L23/25 were reported (Jeeninga, R.E., Venema, J. and RauC, H.A. (1996) J. Mol. Biol. 263:648-656). In this article, the authors, in contradiction with previous experimental results (El-Baradi, T.T.A.L., De Regt, V.C.H.E, Planta, R.J., Nierhaus, K.H. and RauC, H.A. (1987) Biochimie 69:939-948), show that rat RL23a and S. cerevisiae L25 are functionally equivalent and bind almost equally ef- ficiently to both the yeast and the mammalian r-RNA. These new findings fit well with our conclusion based on the detection of only a few significant structural dif- ferences between the mammals and yeasts L23/25- rRNA binding sites.