genesforthe eight ribosomalproteins are clustered ... · tem(software development, tokyo). rnagel...

5
Proc. Nati. Acad. Sci. USA Vol. 83, pp. 6030-6034, August 1986 Genetics Genes for the eight ribosomal proteins are clustered on the chloroplast genome of tobacco (Nicotiana tabacum): Similarity to the S10 and spc operons of Escherichia coli (molecular cloning/DNA sequence/open reading frame/intron/blot hybridization) MINORU TANAKA*, TATSUYA WAKASUGI*, MAMORU SUGITAt, KAZUO SHINOZAKI*, AND MASAHIRO SUGIURA*t *Center for Gene Research, Nagoya University, Chikusa, Nagoya 464, Japan; and tDepartment of Botany, Hokkaido University, Sapporo 060, Japan Communicated by Dan L. Lindsley, April 21, 1986 ABSTRACT The nucleotide sequence of a tobacco (Nicotiana tabacum) chloroplast gene duster that encodes eight proteins homologous to Escherichia coli ribosomal proteins L23, L2, S19, L22, S3, L16, L14, and S8 has been determined. RNA gel blot hybridization revealed that all eight coding regions are expressed in the chloroplasts. The arrangement of the eight genes resembles that found in the E. coUl S10 and spc operons. Among the eight genes, the L2 and L16 genes contain 666- and 1020-base-pair introns, respectively. These intron boundary sequences are consistent with the conserved bound- ary sequences of the chloroplast group m introns [Shinozaki, K., Deno, H., Sugita, M., Kuramitsu, S. & Sugiura, M. (1986) Mol. Gen. Genet. 202, 1-5]. in 6x SSC at 37°C for 2 hr. (1x SSC = 0.15 M NaCi, 0.015 M sodium citrate, pH 7.0.) Recombinant plasmids pTBal, pTBa7, pTBa8, and pTP10 containing 19.3-, 5.0-, and 4.8-kilobase-pair (kbp) BamHI fragments and a 2.9-kbp Pst I fragment of Nicotiana tabacum var. Bright Yellow 4 chloroplast DNA, respectively, were constructed as described using pBR322 (ref. 10 and Fig. 1). The DNA sequence was determined by a combination of the chemical method (11) and the dideoxy chain-termination method (12) using mplO/11 and E. coli JM109. DNA se- quences were analyzed using the GENETYX software sys- tem (Software Development, Tokyo). RNA gel blot hybrid- ization was carried out as described (13). Chloroplast ribosomes in higher plants are 70S in size and contain 23S, 16S, SS, and 4.5S RNAs and =60 ribosomal proteins (1). Analyses of the synthesis of ribosomal proteins in isolated chloroplasts have shown that a chloroplast genome encodes about one-third of the ribosomal proteins in higher plant species (2, 3). Since the identification and sequencing of the first tobacco chloroplast small subunit ribosomal protein (CS) gene for CS19 (4), several additional genes for chloroplast ribosomal proteins located in chloroplast DNAs have been identified through their homol- ogy with Escherichia coli ribosomal protein genes (5-9). Because of the success of this approach, we searched for further ribosomal protein genes in tobacco chloroplast DNA by hybridization with Afus3 DNA, which contains fourE. coli ribosomal protein operons (e.g., ref. 25). We sequenced the DNA region that hybridized strongly with the E. coli probe and found open reading frames (ORF) whose amino acid sequences resemble those of E. coli ribosomal proteins. Here we describe a gene cluster encoding eight ribosomal proteins in tobacco chloroplast DNA. The organization of the eight tobacco chloroplast genes is similar to that found in corresponding order in the E. coli S10 and spc operons. Nevertheless, two of the above eight genes contain long introns in spite of their homologies with genes encoding the corresponding E. coli ribosomal proteins. MATERIALS AND METHODS The transducing phage Xfus3 was kindly provided by K. Isono, and its 10 and 4.6% EcoRI fragments that contain parts of the S10 and spc operons of E. coli were used as probes. Southern blot hybridization was performed in 28% (vol/vol) formamide, 1 M NaCl, 10 mM Tris-HCl (pH 7.5), and lx Denhardt's solution at 37°C for 24 hr, and filters were washed RESULTS DNA Sequence. BamHI digests of tobacco chloroplast DNA blotted to nylon filter sheets were hybridized with nick-translated Xfus3 DNA fragments that carried portions of the E. coli S10 and spc operons. The DNA probes hybridized strongly to the 5.0-kbp BamHI fragment (Ba7), moderately to the 19.3- and 4.8-kbp BamHI fragments (Bal and Ba8, respectively), and weakly to several other fragments (data not shown). A part of the Ba7 fragment has previously been sequenced and shown to contain the gene for the ribosomal protein CS19 (rps 19 gene product) (4), and the junction (JLB) between the inverted repeat B (IRB) and the large single-copy region (14). For the present study, we sequenced the entire Ba7 fragment and its adjacent part of the Bal fragment by the strategy shown in Fig. 2. We used the 2.9-kbp Pst I fragment (PslO) to confim that there were no small BamHI pieces between the Bal and Ba7 fragments (see Fig. 1). Fig. 3 shows the DNA sequence of a 6207-bp portion (the left end of Ba7 to the Taq I site in S11, see Fig. 2). The Ba8 fragment lies in a symmetrical position to Ba7 on the circular chloroplast DNA (see Fig. 1) and has been shown to contain the junction (JLA) between the inverted repeat A (IRA) and the large single-copy region, trnH (14), psbA (15), and the 3' exon of trnK (16). We also sequenced the remaining portion (within IRA) of the Ba8 fragment by the same strategy shown in Fig. 2. The 2098-bp sequence (JLA to the first BamHI site) of IRA was found to be completely identical to the corresponding sequence (JLB to the first BamHI site) of IRB. Determination of the 6207-bp sequence revealed that there is a gene for tRNAIIC (details will be published elsewhere) and 11 ORFs on the same DNA strand (strand B). Abbreviations: bp, base pairs; CL, chloroplast large subunit ribo- somal protein: CS, chloroplast small subunit ribosomal protein; ORF, open reading frame. *To whom reprint requests should be addressed. 6030 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on December 24, 2020

Upload: others

Post on 03-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genesforthe eight ribosomalproteins are clustered ... · tem(Software Development, Tokyo). RNAgel blot hybrid-ization wascarried out as described (13). Chloroplast ribosomes in higher

Proc. Nati. Acad. Sci. USAVol. 83, pp. 6030-6034, August 1986Genetics

Genes for the eight ribosomal proteins are clustered on thechloroplast genome of tobacco (Nicotiana tabacum):Similarity to the S10 and spc operons of Escherichia coli

(molecular cloning/DNA sequence/open reading frame/intron/blot hybridization)

MINORU TANAKA*, TATSUYA WAKASUGI*, MAMORU SUGITAt, KAZUO SHINOZAKI*,AND MASAHIRO SUGIURA*t*Center for Gene Research, Nagoya University, Chikusa, Nagoya 464, Japan; and tDepartment of Botany, Hokkaido University, Sapporo 060, Japan

Communicated by Dan L. Lindsley, April 21, 1986

ABSTRACT The nucleotide sequence of a tobacco(Nicotiana tabacum) chloroplast gene duster that encodes eightproteins homologous to Escherichia coli ribosomal proteinsL23, L2, S19, L22, S3, L16, L14, and S8 has been determined.RNA gel blot hybridization revealed that all eight codingregions are expressed in the chloroplasts. The arrangement ofthe eight genes resembles that found in the E. coUl S10 and spcoperons. Among the eight genes, the L2 and L16 genes contain666- and 1020-base-pair introns, respectively. These intronboundary sequences are consistent with the conserved bound-ary sequences of the chloroplast groupm introns [Shinozaki,K., Deno, H., Sugita, M., Kuramitsu, S. & Sugiura, M. (1986)Mol. Gen. Genet. 202, 1-5].

in 6x SSC at 37°C for 2 hr. (1x SSC = 0.15 M NaCi, 0.015M sodium citrate, pH 7.0.)Recombinant plasmids pTBal, pTBa7, pTBa8, and pTP10

containing 19.3-, 5.0-, and 4.8-kilobase-pair (kbp) BamHIfragments and a 2.9-kbp Pst I fragment ofNicotiana tabacumvar. Bright Yellow 4 chloroplast DNA, respectively, wereconstructed as described using pBR322 (ref. 10 and Fig. 1).The DNA sequence was determined by a combination of thechemical method (11) and the dideoxy chain-terminationmethod (12) using mplO/11 and E. coli JM109. DNA se-quences were analyzed using the GENETYX software sys-tem (Software Development, Tokyo). RNA gel blot hybrid-ization was carried out as described (13).

Chloroplast ribosomes in higher plants are 70S in size andcontain 23S, 16S, SS, and 4.5S RNAs and =60 ribosomalproteins (1). Analyses of the synthesis of ribosomal proteinsin isolated chloroplasts have shown that a chloroplastgenome encodes about one-third of the ribosomal proteins inhigher plant species (2, 3). Since the identification andsequencing of the first tobacco chloroplast small subunitribosomal protein (CS) gene for CS19 (4), several additionalgenes for chloroplast ribosomal proteins located inchloroplast DNAs have been identified through their homol-ogy with Escherichia coli ribosomal protein genes (5-9).Because of the success of this approach, we searched forfurther ribosomal protein genes in tobacco chloroplast DNAby hybridization with Afus3 DNA, which contains fourE. coliribosomal protein operons (e.g., ref. 25). We sequenced theDNA region that hybridized strongly with the E. coli probeand found open reading frames (ORF) whose amino acidsequences resemble those of E. coli ribosomal proteins.Here we describe a gene cluster encoding eight ribosomal

proteins in tobacco chloroplast DNA. The organization oftheeight tobacco chloroplast genes is similar to that found incorresponding order in the E. coli S10 and spc operons.Nevertheless, two of the above eight genes contain longintrons in spite of their homologies with genes encoding thecorresponding E. coli ribosomal proteins.

MATERIALS AND METHODSThe transducing phage Xfus3 was kindly provided by K.Isono, and its 10 and 4.6% EcoRI fragments that contain partsof the S10 and spc operons of E. coli were used as probes.Southern blot hybridization was performed in 28% (vol/vol)formamide, 1 M NaCl, 10 mM Tris-HCl (pH 7.5), and lxDenhardt's solution at 37°C for 24 hr, and filters were washed

RESULTSDNA Sequence. BamHI digests of tobacco chloroplast

DNA blotted to nylon filter sheets were hybridized withnick-translated Xfus3 DNA fragments that carried portions ofthe E. coli S10 and spc operons. The DNA probes hybridizedstrongly to the 5.0-kbpBamHI fragment (Ba7), moderately tothe 19.3- and 4.8-kbp BamHI fragments (Bal and Ba8,respectively), and weakly to several other fragments (datanot shown). A part of the Ba7 fragment has previously beensequenced and shown to contain the gene for the ribosomalprotein CS19 (rps 19 gene product) (4), and the junction (JLB)between the inverted repeat B (IRB) and the large single-copyregion (14).For the present study, we sequenced the entire Ba7

fragment and its adjacent part of the Bal fragment by thestrategy shown in Fig. 2. We used the 2.9-kbp Pst I fragment(PslO) to confim that there were no small BamHI piecesbetween the Bal and Ba7 fragments (see Fig. 1). Fig. 3 showsthe DNA sequence of a 6207-bp portion (the left end of Ba7to the Taq I site in S11, see Fig. 2). The Ba8 fragment lies ina symmetrical position to Ba7 on the circular chloroplastDNA (see Fig. 1) and has been shown to contain the junction(JLA) between the inverted repeat A (IRA) and the largesingle-copy region, trnH (14), psbA (15), and the 3' exon oftrnK (16). We also sequenced the remaining portion (withinIRA) of the Ba8 fragment by the same strategy shown in Fig.2. The 2098-bp sequence (JLA to the first BamHI site) of IRAwas found to be completely identical to the correspondingsequence (JLB to the first BamHI site) of IRB. Determinationof the 6207-bp sequence revealed that there is a gene fortRNAIIC (details will be published elsewhere) and 11 ORFs onthe same DNA strand (strand B).

Abbreviations: bp, base pairs; CL, chloroplast large subunit ribo-somal protein: CS, chloroplast small subunit ribosomal protein;ORF, open reading frame.*To whom reprint requests should be addressed.

6030

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 24

, 202

0

Page 2: Genesforthe eight ribosomalproteins are clustered ... · tem(Software Development, Tokyo). RNAgel blot hybrid-ization wascarried out as described (13). Chloroplast ribosomes in higher

Proc. Natl. Acad. Sci. USA 83 (1986) 6031

LSC

(g

FIG. 1. Positions of the cloned fragments and the genes for theribosomal proteins on the Sal I cleavage map of tobacco chloroplastDNA. The genes for the large subunit of ribulose bisphosphatecarboxylase (rbcL), the 32-kDa thylakoid membrane protein (psbA),and the rRNA operons (rrnA and rrnB) are marked. JLA and JLB arethejunctions between the inverted repeats A and B (IRA and IRB) andthe large single-copy region (LSC).

Gene Cluster Coding for Proteins Homologous to E. coliRibosomal Proteins. We have reported the sequence of therpsl9 gene in tobacco chloroplast DNA (4). ORF5 shownhere corresponds to rpsl9 (Fig. 3). rpsl9 has been reportedin spinach (7), Nicotiana debneyi (7), and duckweed (17). Thegene for the chloroplast large subunit ribosomal protein (CL)CL2 (rpl2) has been found upstream from rpsl9 in spinachand N. debneyi chloroplast DNAs, and the N. debneyi rpl2has been found to contain a 666-bp intron (7). Based on theirhigh homology with the N. debneyi rpl2, ORF2 and ORF4represent the tobacco (N. tabacum) rpl2 that also contains a666-bp intron (positions 943-1608). The 62-codon ORF3 iswithin the intron and unlikely to be a gene for a ribosomalprotein. The tobacco CL2 protein, deduced from the DNA

0

ORF

SailPstI

AccIAccliClalEcoRIHindMSau3AXbalXholRsalTaqI

sequence, shows 85, 82, and 48% sequence homologies withthe deduced N. debneyi CL2, spinach CL2, and E. coli L2proteins, respectively (all protein sequences are deducedfrom the DNA sequences hereafter).The proteins predicted from ORF1, ORF6, and ORF7 show

23, 26, and 38% sequence homologies with the E. coli L23,L22, and S3 proteins, respectively. In the case of S3 proteins,several gaps were introduced to maximize the homology. TheE. coli S10 operon consists of the genes for S10, L3, L4, L23,L2, S19, L22, S3, L16, L29, and S17 in this order (18). Theorder of ORF1, ORF2-4 (rpl2), ORF5 (rpsl9), ORF6, andORF7 is the same as that of the S10 operon. Therefore, wepropose that ORF1, ORF6, and ORF7 are the genes for theCL23, CL22, and CS3 proteins (rp123, rp122, and rps3),respectively. The deduced polypeptide for ORF9 showed asequence homology with the E. coli L16 protein. When ORF9is combined with the short 3-codon ORF (positions3690-3698) between ORF7 and ORF8 and a 1020-bp insertionis introduced, it is more similar to the E. coli L16 protein (56%homology). We, therefore, propose that the short ORF plusORF9 is the gene for CL16 (rpll6). The intron sequence wasassigned to be the positions 3699-4718 by comparing theconserved intron-exon boundary sequences of the chloro-plast group III introns (9) and the E. coli L16 sequence. Theintron is 1020 bp long, which is the longest intron so faranalyzed in chloroplast genes for proteins. The 80-codonORF8 is in this intron and showed no homology with any ofthe E. coli ribosomal proteins nor with ORF3 in the rpl2intron.The proteins derived from ORF10 and ORF11 have 55 and

42% (with gaps) sequence homologies with the E. coli L14and S8 proteins, respectively, suggesting that ORF10 andORF11 are the genes for CL14 and CS8 (rpll4 and rps8). TheE. coli spc operon contains the genes for L14, L24, L5, S14,S8, and so on in this order (19). The order of ORF10 (rp1l4)and ORF11 (rps8) is similar to the spc operon when the genesfor L24, L5, and S14 are deleted. The gene for CS14 has beenfound before the tRNAfMet gene in the middle of the largesingle-copy region of liverwort chloroplast DNA (8) andtobacco chloroplast DNA (unpublished data). The deducedamino acid sequences of the seven genes for the chloroplastribosomal proteins so far reported showed 36-68% homolo-

3 (kbp)2 4 5 6

IRB JLB LSCrp123 rPI2 rpsl9 rp/22 rps3 rpIl16 rpl14 rps8

I ~~~~~~~Ba7BalS7 ISbol

Ps6b Ps5 2 PslO Ps9I Io

II I ., IIIP-----]~~~~~~~~

PstiHinf ISau3ATaqI

IRA JLAL E___ =tr- lI4--4;! ~

LSC

FIG. 2. Physical map ofthe cloned Ba7, Ba8, PslO fragments, a part ofthe cloned Bal fragment oftobacco chloroplast DNA, and the strategyfor sequencing parts of them. The locations ofORFs and genes are shown in the upper boxes. Arrows show the direction and extent of the DNAregions sequenced by the chemical method (-) and the dideoxy chain-termination method (-).

Genetics: Tanaka et al.

I

Q.l I

.4- 1

I

Dow

nloa

ded

by g

uest

on

Dec

embe

r 24

, 202

0

Page 3: Genesforthe eight ribosomalproteins are clustered ... · tem(Software Development, Tokyo). RNAgel blot hybrid-ization wascarried out as described (13). Chloroplast ribosomes in higher

6032 Genetics: Tanaka et al. Proc. Natl. Acad. Sci. USA 83 (1986)

OBamHI tRNA-IIeGATCCCCGCTAGCATCCATGGCTGAATGGTTAAAGCGCCCAACCATAATTGGCGAATTCGAGGTCAATTCCrACTGGATGACGCCAATGGGACCCCCAATAAGCrA__GGAATTGGCTCTGrATCAATGGAAT 140

23=ORF1CTCATCATCCATACATAACGAATI_AGTGTGGTATATTCATATCATAATATATGAACAGTAAGAACTAGCATTCTTATTGAGACTATAACTCATAGGGAAGAAAATTGGAATCAAATATGCAGTATTTAC 280

MD G I K Y A V F T

AGACAAAAGTATTCGGTTATTGGGGAAAAATCAATATACTTCTAATGTCGAATCAGGATCAACTAGGACAGPA-AATAAAGCATTGGGTCGAACTCTTCTTTGGTGTCAAGGTAATAGCTATGAATAGTCATCGACTTCCGG 420D K S I R L L G K N Q Y T S N V E S G S T R T E I K H W V E L F F G V K V I A M N S H R L P

rp/2=ORF2-4GAAAGAGTAGAAGAATGGGACCTATTATGGGACATACAATGCATTACAGACGTATGATCATTACGCTrCAACCGGGTTATTCTATTCCACCTCTTAGAAAGAAAAGAAC¶1'MAAAAAATACTTAATA OGATA 560G K S R R M G P I M G H T M H Y R R M I I T L Q P G Y S I P P L R K K RT AICATTTATACAAAACTCTACCCCGAGCACACGCAATGGAACCGTAGACAGTCAAGTGAAATCCAATCCACGAAATAATTTGATCTATGGACAGCATCATTGTGGTu"AAAGGTCGTAATGCCAGGAACTACCGCAAG 700H L Y K T S T P S T R N G T V D S Q V K S N P R N N L I Y G Q H H C G K G R N A R G I I T A R

GCATAGAGGGGGAGGTCATAAGCGTCTATACCGTAAAATCGATTTTCGACGGAATGAAAAAGACATATATGGTAGAATCGTAACCATAGAATACGACCCTAATCGAAATGCATACATTTGTCTCATACACTATGGGGATG 840oH R G G G H K R L Y R K I D F R R N E K D I Y G R I- V T I E Y D P N R N A Y I C L I H Y G D

GrGAGAAGAGATATATTTTACATCCCAGAGGGGCrATAATTGGAGATACCATTGI'rTCrGGrACAGAAGTTCCrATAAAAATGGGAAATGCCCTACC=~llGA^TGCGGTII.CACTATTGAmrACGrAATTGGAAATAA 980oG E K R Y I L H P R G A I I G D T I V S G T E V P I K M G N A L P L

C)RF3CCAATTAGGTTACGACGAAkACCTAGAAATCGATCACrGATCCAATTTGAGrACCrCTGCAGGATAGACCTCAACAGAAAACTGAAGAGTAACGGCAGCAAgiATTGAGTTCAGrAGTTCC'TCATATAAAATTATTGAC 1120TCTAGAGATATAGrAATATGGAGAAGACAAAATTGTTCAAGCACCGACAGAACCGGAAGCGcCCCCTTrTTCAAAGAGAGGAGGACGGGTTATTCACATTTCATTTGATGGrCAGAGGCGAATTGAAAGrrAAGCAGrG 1 260GGAATTCT.MAGATTCCCCGGGGGAAAAATAGAGATGTCTCCrACGTTACCCATAATATGrGGAAGTATCGPCGrAATTTCATAGAGTCATTCGGTCrGAATGCrACATGAAGAACATAAGCCAGATGACGGAACGGGAA 1400GACCCAGGATGTAGAAGATCATAACATGAGTGATTCGGCAGATTTGGATTCATATATATATCCACCCATGTGGTACTTCATTCTACGATATATATAAGATCCATCTGTATAGATATCATCATCTACATCCAGAAAGAAGT 1 540o

v ORF4ATGCTTTGAAGAAGCTTGrACAGTTTGGGAAGGGGTTTTGATTGATCAAAAGAAGAATCTACTTCAACCGATAIECCCTTAGGCACGGCCATACATAACATAGAAATCACACITGGAAAGGGTGGACAATTAGCTAGAG 1680

T L M P L G T A I H N I E I T L G K G G Q L A RCAGCGGGTGCTGTAGCGAAACTGATTGCAAAAGAGGGGAAATCGGCCACATTAAAATTACCTTCTGGGGAGGTCCGTTTGATATCCAAAAACTGCTCAGCAACAGTCGGACAAGTGGGGAATGTTGGGGTGAACCAGAAA 182 0A A G A V A K L I A K E G K S A T L K L P S G E V R L I S K N C S A T V G Q V G N V G V N Q KAGTTTGGGTAGAGCCGGATCTAAGCGTTGGCTAGGTAAGCGTCCTGTAGTAAGAGGAGTAGTTATGAACCCIGTAGACCATCCCCATGGGGGTGGTGAAGGGAGAGCCCCAATTGGTAGAAAAAAACCCACAACCCCTTG 1960S L G R A G S K R W L G K R P V V R G V V M N P V D ,H P H G G G E G R A P I G R K K P T T P WGGGTTATCCTGCACTTGGAAGAAGAAGTAGAAAAAGGAATAAATATAGrGATAATTTGATTCTCGTCGCCGTAGrAA-kl,&AGAGAAAATCGAATTAAATTCrTCGrTTTTACAAAAAAAAAAAAT&GGA TAAGC 21001G Y P A L G R R S R K R N K Y S D N L I L R R R S K 4

rn19=ORF5 A

TAAGrTCACTAAAAAAATCCCTTTGTAGCCAATCAmTTTIAAAAAATTGATAAGCrTAAcAcAAAAGcAGAAAAAGAAATAATAGTArrGGGTCCCGGGMACTACMATATACCCACAATGATCGGT 2 240o|MT R S L K K N P F V A N H L L K K I D K L N T K A E K E I I V T W S R A S T I I P T M I ,j

tACGATTGCTATCCATAATGGAAAAGAGCATTGCCTATTATATAACGGATAGTATGGTAGGCCACAAATTGGGAGAATTGCACCrACTTTAAATTTAGAGGAcTCATGAAAAGCGATAATAGATcrCG~GTCG 2 380IH T I A I H N G K E H L P I Y I T D S M V G H K L G E F A P T L N F R G H A K S D N R SRR|

r/22=ORF6A,TATTAATAAAAAAAATCrAGATGCTTATGATTCAGrAGTAGGAGGCAAAC TaGTA-AGAAAAAA,'AACAGAAGTATATGCTTTAGGTGAACATATATCrATGrCrGCrGACAAAGCACGAAGAGTAATTAATCA 2520

|ML K K K K T E V Y A L G E H I S M S A D K A R R V I N Q.AATTCGCGGCCGTTCCrATGAGGAAAC~cLATGATACrAGAACrCATGCCCcT CGAGCATGTTATCCCcATTTGAAATTGATTATTcrGCAGCAGCAAATGCrAGTTACAATATGGGT~CCAGCGAAGCCAATTAG 2660

I R G R S Y E E T L M I L E L M P Y R A C Y P I L K L I Y S A A A N A S Y N M G S S E A N LTCATTAGTAAAGCCGAAGTCAATGGAGGTACTACTGTGAAGAAATTGAAACCTCGAGCTCGAGGACGTAGTT'CCAATAAAAAGATCGACCTGTCATATAACTATTGTAATGAAAGATATATCTTTAGATGATGAATAT 280ooV I S K A E V N G G T T V K K L K P R A R G .R S F P I K R S T C H I T I V M K D I S L D D E Y

rps3=ORF71-GrAGAGATGTATTrCGTTA-"-AAACGAGATGGA-AA-AAAAATCTACAGCTATGCCGrATCGrGATATGrATAATAGrGGGGCA= W<GGACAAAAAATAAATCCACTTGG=1TTCAGACTGGGTACAACCCAAGGTCAT 2940V E M Y S L K K T R W K K K S T A M P Y R D M Y N S G G IL W D K KI

jM G Q K I N P L G F R L G T T Q G HCATTCCCTITGGTTTTCACAACCAAAAAA ATTCTGAAGGTTrACAAGAAGATCAAAAAATAAGAGATTGTATCAA-GAATTATGrACAAAAGAATATGAGAACGTCCTCTGGCGTCGAGGGAATTGCACGTATAGAGAT 3080H S L W F S Q P K N Y S E G L Q E D Q K I R D C I K N Y V Q K N M R T S S G V E G I A R I E I

TCAAAAAAGAATCGATCrGATCCAGGTCATAATCTTTATGGGATTCCCAAAATTATTAATAGAAAGTCr,Acc serc.CGGAATeSGAAGAATTACAAAC-GAC('-TA(AAAAAC.-AATTTCATTGTGTAAACCGAATC-CA 3 220Q X R I D L I Q V I I F M G F P K L L I E S R P R G I E E L Q T T L Q K E F H C V N R K L N

TTGCTGTCACAAGAATTGCAAAACCTTATGGAAACCCTAATATTCrTGCAGAATTTATAGCrGGACAATTAAAGAATAGAGTTTlCCTTTCGAAAAGCAATGAAAAAGGCTATTGAATTAACAGAACAAGCAGATACAAAA 3 360I A V T R I A K P Y G N P N I L A E F I A G Q L K N R V S F R K A M K K A I E L T E Q A D T KGGAATTCAAATACAAATTGCGGGGCGTATCGACGGAAAAGAAATTGCACGTGTCGAATGGATCAGAGAAGGTAGGGTTCCCCTACAAACGATTCGAGCGAAAATTGATTATTGCTCTTATACAGTTCGAACTATCTATGG 35 00G I Q I Q I A G R I D G K E I A R V E W I R E G R V P L Q T I R A K I D Y C S Y T V R T I Y G

AGTATTGGGCATCAAAATTTGGATATTTCTAGAC:GAGGGuATACACTTGTCTTTCCCTTCTATCCAATGATTGAACAAAAGACAAATTCATTCTTTTTCTAATCAATCAAGGGAAACATTTCTA.ATTCTTA 3640_V L G I K I W I F L D E E| rpl16ATTrCTATAAGGTTGAATAAAAJATTrCGATTGACCATTGATATAATTGCITaAGTAGrGrGACTCCGTrGGTTTTAGGGrTAGGATTAAAAAAGACCAGCCCACATAGTATGAACTAAAAaACATAGAACTAATAA 3780

IM L S&CCAACCCATCACTTCGCATrATCTGGATCrAAAGAACCAGTCAAGATATGATATATAGGrCATATCTTGrAGCAACTGAAATCTWTGCATAAACAAAAAAGGAAATCTGATTCrAAG'TTGrAAAGCAAAATAGACAAA 3920AAAGATGTGGATAAATGGAAGGATGAGAGAAAGAGAGAAAAAGAATACCAATGATATAAAATTCCAATATGTAAGGTCTATGAGTAATCTCATAAAAGGCAGTGTAATAAAGCATCAATACGCATTrCATACCATAATAA 4060

ORF8ATGAATCTTrCTTATAGAAATAGAACAAAAAATCAAGAGCTCGAGCCAATAAAGACTGAGAAGATTGACrCAAGAACCAATTCATTCrGAGCTCCATTGrAGAATTCGGACCTAACCATTAAGTAAGAAMATGGGAA 4200CGACGGAACTTGTGAATGCAAAAGATTCTATTGAAAAAGGAATCTTAATGATTCACTGGTCGGGATGGCGGAACGAACCAGAGATTAATTCATGTATT CGGAGATCTGAGAAGTCACGAGTTAACCCTACAAATGAAATA 4340GGGATTGAAAGAGrCAATATTCGCCCGCGAAAAClTIT=lATTGCAAAATTTAGGACAATACAATAAAGGACAAAATAAGGATTTGGrATWaTAATACAAmlwTTl1TAGATTTCTATTTATAAAACTCAAAAGTTA 4480GTTATCTATTCAAACAAGATATACAAATTACTAATAGATTGAATGAAATCTCAAAGAATCCCACGTTCAAGGTATTACTCAGTAAATACATATATATCTTAACTAAGATTGACTATTCTAGCTTAATTCTAATTAAATT 4620

ORF9TTTGAATCCI TTATTrCCCGAGGAGCTGGATGAGAAGAAACTrCTCACGrCCGGTTCTGTAGrAGAGATGGAATrCAGAAACAACCATCAACTATAACCCCAAAAGAACCAGATTCCGrAAACAACATAGAGGAAGAA;E, 4760

TP K R T R F R K Q H R G R MAAGGGAATATCTCATCGAGGTAATCATATTTCTTTCGGTAAATATGCTCTTCAGGCACTTGAACCTGCTTGGATTACATCTAGACAAATAGAAGCAGGCCGACGAGCAATGACACGAAATGCACGTCGT!GGTGGAAAAAT 4900K G I S H R G N H I S F G K Y A L W I T S R Q I E A G R R A M T R N A R R G G K I

ATGGGTACGTATATTTCCAGACAAACCAGTTACACTAAGACCCGCAGAAACACGTATGGGTTCAGGAAAAGGATCCCCTGAATATTGGGTAGCTGTTGTTAAACCGGGTCGAATACTTTATGAAATGGGTGGAGTAACAG 5040W V R I F P D K P V T L R P A E T R M G S G K G S P E Y W V A V V K P G R I L Y E M G G V T

AAAATATAGCCAGAAGGGCTATTTCACrAGCAGCATCTAAAATGCCTATACGAACTCAATITCATTATTC GrXTAATAGAAAAAAATGCGGAAAGGGCTCTTAGATATGAAACGAAACCGCAT GrTTTTlsq 5180GE N I A R R A I S L A A S K M P I R T Q F I I Srp/14OORR5

GGACAAAA-TA CrTTlwIrmCTCGCCCTTGCATT CA-AAGAACGGATT AA~aTCAACCTCAGACCCATrTAAATGrAGCGGATAACAGCGGGGCrCGAGAATTGATGrGrATrCGAATCATAGGAGCTA 5 320MI Q P Q T H L N V A D N S G A R E L M C I R I I G A

GCAATCGCGATATGCTCATATTGGTGACGTTATTGATGCTGTGATCAAAGAAGCCGACCAAATATGCCCCTAGAAAGATCAGAAGTAGTCAGAGCTGTAATTGTGCGGACCTGTAAAGAACTCAAACGTGACAACGGG 5 460S N R R Y A H I G D V I V A V I K E A V P N M P L E R S E V V R A V I V R T C K E L K R D N GATATTAATACGATATGATGACAATGCTGCAGTTGTATTGATCAAGGTGGAGAGAACTCGAATTTTTGGTGCAATCGCCCGGTGA GATAATTAAATTACTAAAATAGTTTCATTAGCTCCCGA 5600M I I R Y D D N A A V V I D Q E G R K S K G T R I F G A I A R E L R E L N F T K I V S L A P E

5740

5880

6020I V Q I L L R E G F I E N V R K H R E K N K Y F L V L T L R H R R N R K R P Y R N I L N L K R

AATCAGIrCGACCTGGTCTACGAATCTATTCTAATTATCAACGAATTCCGCGAA7lAGGTGGAATGGGGAITGIrAAlrCTTCTACrTCCGAGGTATAATGACAGACCGAGAGGCTCGACTAGAAGGAATAGGT!GGAG 6160I S R P G L R I Y S N Y Q R I P R I L G G M G I V I L S T S R G I M T D R E A R L E G I G G

AGATTTTUGrGIrrATATCTGrCCTTTTAATATCCAAATTGGATT, 66207E I L C Y I W FTaql

FIG. 3. DNA sequence of the 6207-bp region containing the genes for the eight ribosomal proteins of tobacco chloroplasts. The RNA-likestrand (strand B) is presented. Coding regions including introns are boxed. The deduced amino acid sequences are shown below the DNAsequences. Triangles indicate possible intron sites. Start and stop codons of the 11 ORFs are underlined twice. Sequences similar to the -35region, the -10 region, and the Shine-Dalgamo signal are underlined. The sequence between positions 2023 and 2502 has been reported (4).

gies with those of E. coli counterparts (4-9). Therefore, eight tobacco ORFs and the corresponding E. coli ribosomalsequence homologies between the proteins deduced from the proteins (Fig. 4) together with the coincidence of their order

GGTATTA aAAGAAATITGTGATCrGTTTCTAGrGGGGrATTTGAAAG AT AATAGATTCAATCAGTAGTTGTCAAAAATAACGCAAATACATCrTGTTFL U ORps8=ORF11TATATCAAATTTGGAGGCCCCAATAATTTTAGTTCATC, TGGGA ACACATGCTGAGATAATAACrTCIATACGAAATGCTGATATGGATCGAAAAAGGGTGGTTCGAATAGCATCTACTAATATTACCGAAAAT

NM G R D T I A E I I T S I R N A D M D R K R V V R I A S T N I T E NfATTGTTCAAATACTTTTACGAGAAGGTTTTATCGAAAACGTGAGAAAACATCGAGAAAAAAACAAATATTTTTTGG AACCCTGCGACATAGAAGGAATAGGAAAAGACCCTATAGAAATATTTTAAATTTAAAACG

Dow

nloa

ded

by g

uest

on

Dec

embe

r 24

, 202

0

Page 4: Genesforthe eight ribosomalproteins are clustered ... · tem(Software Development, Tokyo). RNAgel blot hybrid-ization wascarried out as described (13). Chloroplast ribosomes in higher

Proc. Natl. Acad. Sci. USA 83 (1986) 6033

CL23 MDGIKYrRLLG YTS S L HRL IM TMHYRRMII1PS SIPPLRKKRT 93EL23 MIREERLLKVLRAPE*f EKAAMEKSrIVLIjAKDK K LV RIgRSDWKKAY (E8pNLDFVGGAE 100

CL2 IF HLYJTPSTRNG¶EjPSQVKSNPRNNLIYGQHHCGKGI I GRIVIYDPNTIGEL2 L VKC GRRHVVFFVNPELHKGKPFAPLLEKNSKS GGH I AVVERYDPN G

II EEV aTD HN TL KGGQLA VAKLI S isK VGVNQKS XR WKI EV RNI T N P K rYVQi Y E HM L R

PI>VVPHG W+AI PW4RS S K| 2749AFNPVDHPHGGGEGKPWQTF273

CS19 LK TK II WsT I I I YI VGHKLGEFAP NG SDNRSRR 92ES19 SLD ID LL ESG PL Ws P I I N LGEFAP RTYDKKAKKK 92

CL22 MLKKKKT ViLGEiI S N SYEET LMPYRCYP I IS SE I PEL2 2 Ht R 4KVSQ A YTNKKjAVLVNLEVS IDD S I ES

If24MISLDDEYVEMYS LKKTRWKKKSTAMPYRDMYNSGGLWDKK 155V0VStR 110

CS3 GQI L TQGHilSEQYSE CIKNYV 4MRT Q LIQ\ FMGFPKLLIES-R-FELQT¶j-EFES3 MGQ VKPW FANIIIEFAD YLTKELJ ---A P; RVT - -------- HT IGKK A

HCVNRKLNJAVTRIF*GNPNIL N R ELTEQADT IQ I I GRVP I 3 SYTV G WIDIAGVPAQ[NIAEVFELDAKLM rS R QNAMRLG KVEV I EGRVP D SEAG W

LD7 2181KGVjLGGMAAVEQPEKPAAQPKKQQRKGRK 233

CL 1 6 MIS R~it SH tHIFKY PAW SQIE IMIEL16 M KRRQ YFF GRGR QIE RIRAM YE

I S W S 13 4l!K T VM~~1TV 13 6

CL14 MI NVADSG RiDIVRIAI{PLERS AV EL IIIDQEGRKS RIFAI LEL14 NADNSA V S D IIIRGKVKYG, KV GV VLNNNSEQPIIFGVTL E

123XI|SLAPEVL| 123

CS 8 MGF*E IS DMDP VI INITEIQILRGFI RK* HRRNF PYRNL LK I RPGLRIYSNYQRIRUL HI ILES8 MSMQP JMANMAKSKLKV F--DT E YFQG_ VVE RPGL RKD VVIE*IGGEM~~~~W13 4

ST VA 130O

FIG. 4. Comparisons of the deduced amino acid sequences of tobacco chloroplast ribosomal proteins (C) with those of corresponding E. coliribosomal proteins (E). Homologous residues are boxed. Triangles indicate intron sites, and numerals indicate amino acid residues.

(Fig. 5) indicate that these eight ORFs are most likely to bethe genes for those ribosomal proteins encoded by thechloroplast DNA.

Expression of the Eight ORFs. Total tobacco chloroplastRNA extracted from young tobacco leaves was electropho-resed, transferred to nylon membrane sheets, and hybridizedwith nick-translated probes containing each ORF. All theeight probes hybridized to several RNA bands ranging from0.3 to 4.5 kilobases as listed in Table 1. These results indicatethat all eight ORFs are expressed in the chloroplasts and thatat least some of the ORFs are cotranscribed. We, therefore,concluded that the eight ORFs represent the genes for thechloroplast ribosomal proteins. This is the first example of along ribosomal protein gene cluster found in a chloroplastgenome.

DISCUSSION

The genes for eight ribosomal proteins, rp123, rpl2, rpsl9,rp122, rps3, rpll6, rpll4, and rps8, are clustered in this orderin tobacco chloroplast genome. Surprisingly, this ordercorresponds to that of the homologous genes in theE. coli S10and spc operons. This finding raises the interesting possibilitythat the genes for ribosomal proteins of chloroplast and E.coli may have evolved from a common ancestral gene set.The rp123 and rpl2 genes lie in the 26-kbp inverted repeat ofthe chloroplast DNA, indicating that these genes are presentin two copies. We have sequenced both copies of rp123 andrpl2 and found them to be identical, suggesting that theinverted repeats A and B are highly homologous or identicalthroughout the entire 26-kbp sequence.

S10 operon F spcoperon g

L23 L2 S19 L22 53 L16 L29S17 L14 L24 L5 S14 S8

E.coli

/1I\\CL23 CL2 CS19 CL22 C'

53f CL14 CII53 CL16 CL14 CS8

FIG. 5. Comparison of the gene arrangement of the tobacco chloroplast ribosomal protein gene cluster with those of the E. coli S10 and spcoperons. Boxes indicate introns.

Tobaccochloroplast

Genetics: Tanaka et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 24

, 202

0

Page 5: Genesforthe eight ribosomalproteins are clustered ... · tem(Software Development, Tokyo). RNAgel blot hybrid-ization wascarried out as described (13). Chloroplast ribosomes in higher

Proc. Nati. Acad. Sci. USA 83 (1986)

Table 1. Major RNA bands detected by blot hybridization

Gene Bands,probe* kilobasestrpl23 1.8, 1.0, 0.3rpl2 2.8, 1.6, 1.0, 0.3rpll9 3.3, 1.9, 1.5rp122 4.5, 3.3, 2.1, 1.3rps3 3.0, 1.6, 0.9, 0.5rpll6 3.8, 1.8, 1.3, 0.3rpll4 4.0, 2.1, 1.5, 0.7rps8 3.3, 1.0, 0.6, 0.3

*Probes used are 343-bp Ava II (rpl23), 298-bp Acc I-Rsa I (rpl2),189-bp Sma I-Xba I (rpsl9), 217-bp Xba I-Pst I (rp122), 382-bp SalI-Xba I (rps3), 132-bp Xba I-BamHI (rpll6), 229-bp Aha III-Pst I(rpll4), and 283-bp Sal I-HindIII (rps8) fragments.tApproximate sizes in kilobases estimated using tobacco mosiacvirus RNA and E. coli 23S, 16S, and 4S RNA as size markers.

The rpl2 and rpll6 genes were found to contain 666- and1020-bp introns, respectively. A 666-bp intron has beenreported in the N. debneyi rpl2 but not in the spinach rpl2 (7).We assigned the tobacco rpl2 intron site between the first andsecond nucleotides of the 131st threonine codon (A I CC) soas to match the conserved intron boundary sequences ofchloroplast group III introns (9). This site is shifted onenucleotide from the N. debneyi site previously suggested (7).The rpl2 and rpll6 introns contain the 62-codon ORF3 andthe 80-codon ORF8, respectively, both starting with GTGand ending with TAA. There is no significant homologybetween ORF3 and ORF8. It should be noted that the 2526-bpintron of the tobacco getie for tRNALYS (UUU) has a509-codon ORF (16). At present no function of these ORFsis known.The relatively low homology (23%) of L23 proteins be-

tween tobacco and E. coli may be due partly to their locationin ribosomes. The L23 protein has been reported to belocated near the base of the crown projection in the 50Ssubunits, which has been one of the more variable regionsduring the course of evolution (20). The tobacco CL22 (155residues) is 45 residues longer than the E. coli L22 (110residues), which makes their calculated homology lower(26%). Another unique feature is that the rpl22 coding regionoverlaps the rps3 coding region by 13 bp. High homologies ofL2, L16, S8, and S19 proteins between tobacco and E. coli(42-57%) may be the reflection of their important functions inthe ribosomes'; the L2 and L16 proteins are known to beinvolved in peptidyltransferase activity (21) and the S8protein to bind to 16S rRNA in the early stage of the 30Ssubunit assembly (22).Long transcripts that encompass several ribosomal protein

genes have been detected in the chloroplasts, indicating thatat least some of the genes are transcribed polycistronically.Further studies are necessary to determine whether the eightgenes discussed in this paper constitute a single operon.Interestingly, small RNA bands of 0.3-0.5 kilobases, whichare shorter than most of the genes, were clearly detected.They may be degradation products of the mRNAs. As thesesmall RNAs seem to be stable, another possibility would bethat they are some elements in posttranscriptional regulationor processing of the ribosomal protein mRNAs. Short RNAspecies have also been observed among transcripts of theputative gene cluster for the apocytochrome b6 and thesubunit 4 of the cytochrome b/f complex in spinach (23) andthe gene cluster for the ATPase subunits I and III in tobacco

found in the monocistronic genes of tobacco chloroplasts sofar examined.

In the 165-bp region between the gene for tRNAL1e andrpl23, sequences similar to the E. coli -35 and -10 regionswere found, suggesting that the initiation site of transcriptionis located before rps23. No promoter-like sequences wereobserved in the 127-bp spacer between rpll6 and rpll4,although the E. coli genes for the L16 and L14 proteins belongto separate operons, S10 and spc, respectively. Shine-Dalgarno-like sequences were found in front of rpsl9, rpl22,rps3, and rpll4 coding regions but not in the other genes. Itis important to elucidate what additional regulatory se-quences may be involved in the coordinate synthesis ofribosomal proteins in the chloroplasts.

Note Added in Proof. In the region downstream from rps8, we havefound additional sequences homologous to the E. coli infA, secX,rpsK, and rpoA in this order on the same strand.

We thank Dr. R. A. Bonchard for editing this manuscript. Thiswork was supported in part by a grant-in-aid from the Ministry ofEducation, Science and Culture, Japan.

1. Dyer, T. A. (1984) in Chloroplast Biogenesis, eds. Baker,N. R. & Barber, J. (Elsevier, Amsterdam), pp. 23-69.

2. Eneas-Filho, J., Hartley, M. R. & Mache, R. (1981) Mol. Gen.Genet. 184, 484-488.

3. Dome, A. M., Lescure, A. M. & Mache, R. (1984) Plant Mol.Biol. 3, 83-90.

4. Sugita, M. & Sugiura, M. (1983) Nucleic Acids Res. 11,1913-1918.

5. Subramanian, A. R., Steinmetz, A. & Bogorad, L. (1983)Nucleic Acids Res. 11, 5277-5286.

6. Montandon, P. E. & Stutz, E. (1984) Nucleic Acids Res. 12,2851-2859.

7. Zurawski, G., Bottomley, W. & Whitfeld, P. R. (1984) NucleicAcids Res. 12, 6547-6558.

8. Umesono, K., Inokuchi, H., Ohyama, K. & Ozeki, H. (1984)Nucleic Acids Res. 12, 9551-9565.

9. Shinozaki, K., Deno, H., Sugita, M., Kuramitsu, S. &Sugiura, M. (1986) Mol. Gen. Genet. 202, 1-5.

10. Sugiura, M. & Kusuda, J. (1979) Mol. Gen. Genet. 172,137-141.

11. Maxam, A. M. & Gilbert, W. (1977) Proc. Natl. Acad. Sci.USA 74, 560-564.

12. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl.Acad. Sci. USA 74, 5463-5467.

13. Ohme, M., Kamogashira, T., Shinozaki, K. & Sugiura, M.(1985) Nucleic Acids Res. 13, 1045-1056.

14. Sugita, M., Kato, A., Shimada, H. & Sugiura, M. (1984) Mol.Gen. Genet. 194, 200-205.

15. Sugita, M. & Sugiura, M. (1984) Mol. Gen. Genet. 195,308-313.

16. Sugita, M., Shinozaki, K. & Sugiura, M. (1985) Proc. Natl.Acad. Sci. USA 82, 3557-3561.

17. Posno, M., Torenvliet, D. J., Lustig, H., van Noort, M. &Groot, G. S. P. (1985) Curr. Genet. 9, 211-219.

18. Zurawski, G. & Zurawski, S. M. (1985) Nucleic Acids Res. 13,4521-4526.

19. Cerretti, D. P., Dean, D., Davis, G. R., Bedwell, D. M. &Nomura, M. (1983) Nucleic Acids Res. 11, 2599-2616.

20. Noller; H. F. (1984) Annu. Rev. Biochem. 53, 119-162.21. Hampl, H., Schulze, H. & Nierhaus, K. H. (1981) J. Biol.

Chem. 256, 2284-2288.22. Nomura, M. & Held, W. A. (1974) in Ribosome, eds. Nomura,

M., Tissieres, A. & Lengyel, P. (Cold Spring Harbor Labora-tory, Cold Spring Harbor, NY), pp. 193-223.

23. Heinemeyer, W., Alt, J. & Herrmann, R. G. (1984) Curr.Genet. 8, 543-549.

24. Deno, H., Shinozaki, K. & Sugiura, M. (1984) Gene 32,195-201.

chloroplasts (24), but no apparent short RNAs have been

6034 Genetics: Tanaka et al.

25. Watson, J. C. & Surzycki, S. J. (1983) Curr. Genet. 7, 201-210.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 24

, 202

0