novel multigene families encoding highly repetitive peptide

7
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1985 by The American Society of Biological Chemists, Inc. Vol. 260, No. 25, Issue of November 5, pp. 13471-13477,1985 Printed in U.S.A. Novel Multigene Families Encoding Highly Repetitive Peptide Sequences SEQUENCE ANALYSES OF RAT AND MOUSE PROLINE-RICHPROTEIN cDNAs* (Received for publication, March 15, 1985) Scott Clement& Haile Mehansho, and Don M. Carlson9 From the Department of Biochemistry, Purdue University, West Lafayette, Indiana 47907 Multigene families encode the proline-rich proteins that are so prominent in human saliva and are dramat- ically induced in mouse and rat salivary glands by isoproterenol treatment and by feeding tannins. A cDNA encoding an acidic proline-rich protein of rat has been sequenced (Ziemer, M. A., Swain, W. F., Rutter, W. J., Clements, S., Ann, D. K., and Carlson D. M. (1984) J. Biol. Chem. 259, 10475-10480). This study presents the nucleotide sequences of five addi- tional proline-rich protein cDNAs complementary to both mouse and rat parotid and submandibular gland mRNAs. Amino acid compositions deduced from the nucleotide sequences are typical for proline-rich pro- teins: 25-45% proline, 18-22% glycine, and 18-22% glutamine and generally an absence of sulfur-contain- ing amino acids except for the initiator methionine. These proline-rich proteins display unusual repeating peptide sequences of 14-19 amino acids. The derived amino acid sequence of the cDNA insert of plasmid pMPl from mouse has a 19-amino acid sequence which is repeated four times. The inserts of plasmids pUMP40 and pUMP4 also from mouse encode for 12 and 11 repeats of a 14-amino acid peptide, respec- tively. These repetitive sequences, and others from rat and mouse cDNAs and from human genomic clones, all show very high homologies and likely evolved from duplication of internal portions of an ancestral gene. Gene conversion could account for the high degree of conservation of nucleotide sequences of the repeat re- gions. Protein derived from the nucleotide sequences are all characterized by four general regions: a puta- tive signal peptide, a transition region, the repetitive region, and a carboxyl-terminal region. The 5”flank- ing sequences and sequences encoding the putative sig- nal peptides are highly conserved (>94%) in all six cDNAs. This sequence conservation may be important in the regulation of the biosynthesis of these unusual proteins. Repeated administration of the @-agonist isoproterenol *This research was supported in part by United States Public Health Service Grants AM-19175 and HL-25225 and a Purdue Uni- versity David Ross fellowship (S. C.). This is Journal Paper 10346 from the Purdue University Agricultural Experiment Station. The costs of publication of this article were defrayed in part by the payment of page charges. This articlemusttherefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. $ Present address: Department of Biochemistry, The University of Texas, Health Science Center at Dallas, 5323 Harry Hines Boulevard, Dallas, TX 75235. J To whom reprint requests should be addressed. causes hypertrophy and hyperplasia of rat (1) and mouse (2) parotid and submandibular glands. The morphological changes are accompanied by a dramatic increase, or induction, in the synthesis of several unusual proteins which are very high in proline, or proline-rich proteins (3-5). Typically these proteins contain 25-45% proline, 18-22% glutamine, and 18- 22% glycine (6). Aromatic and sulfur-containing amino acids areeither very low orabsent. Generally, the proline-rich proteins can be divided into an acidic group containing 9- 11% aspartate and at times phosphate and a basic group containing 7-10% lysine plus arginine. Both groups may be glycosylated. Proline-rich proteins in the parotid glands of both mice (2) and rats (7) are also induced by diets high in tannins. After treatment with isoproterenol, or feeding diets high in tannins, proline-rich proteins can account for over 50% of the total soluble proteins in extracts of rat parotid glands. As expected, this increase in protein synthesis results from very dramatic changes in levels of mRNAs in the glands.’ Cell-freetranslations of RNAs from isoproterenol-treated rats (8) and from rats fed high tannin sorghum (7) have shown that proline-rich proteins are the major translation products. Recently, Ziemer et al. (9) described the preparation and characterization of four rat proline-rich protein cDNA clones and the nucleotide sequence of one of these cDNAs in plasmid pRP33. In this study, we report the nucleotide sequence of another rat proline-rich protein cDNA in plasmid pRP25 and the synthesis and nucleotide sequences of four mouse proline- rich protein cDNAs. Nucleotide and amino acid sequences are compared for homologies. The highly conserved 5‘-flanking regions plus the nucleotide sequences encoding the putative signal peptide regions (>94% homology in six different cDNAs) could possibly be integral to the regulatory mechan- ims of protein synthesis. Gene conversion is postulated as the mechanism for the strong conservation of nucleotide se- quences in therepeat regions. EXPERIMENTAL PROCEDURES~ RESULTS Parotid poly(A+)RNA from isoproterenol-treated mice was enriched for proline-rich protein mRNAs by fractionation on Proline-rich protein mRNAs comprise about 50% of the total amount of mRNAs in parotid glands of isoproterenol-treated rats as determined by dot-blot analyses (S. Clements and D. M. Carlson, unpublished observations). Portions of this paper (including “Experimental Procedures” and Figs. 1, 3-6, and 8) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are available from the Journalof Biolog- ical Chemistry, 9650 Rockville Pike, Bethesda, MD 20814. Request Document No. 85M-771, cite the authors, and include a check or money order for $4.00 per set of photocopies. Full size photocopies are also included in the microfilm edition of the Journal that is available from Waverly Press. 13471

Upload: lamdat

Post on 13-Feb-2017

234 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Novel Multigene Families Encoding Highly Repetitive Peptide

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1985 by The American Society of Biological Chemists, Inc.

Vol. 260, No. 25, Issue of November 5, pp. 13471-13477,1985 Printed in U.S.A.

Novel Multigene Families Encoding Highly Repetitive Peptide Sequences SEQUENCE ANALYSES OF RAT AND MOUSE PROLINE-RICH PROTEIN cDNAs*

(Received for publication, March 15, 1985)

Scott Clement& Haile Mehansho, and Don M. Carlson9 From the Department of Biochemistry, Purdue University, West Lafayette, Indiana 47907

Multigene families encode the proline-rich proteins that are so prominent in human saliva and are dramat- ically induced in mouse and rat salivary glands by isoproterenol treatment and by feeding tannins. A cDNA encoding an acidic proline-rich protein of rat has been sequenced (Ziemer, M. A., Swain, W. F., Rutter, W. J., Clements, S., Ann, D. K., and Carlson D. M. (1984) J. Biol. Chem. 259, 10475-10480). This study presents the nucleotide sequences of five addi- tional proline-rich protein cDNAs complementary to both mouse and rat parotid and submandibular gland mRNAs. Amino acid compositions deduced from the nucleotide sequences are typical for proline-rich pro- teins: 25-45% proline, 18-22% glycine, and 18-22% glutamine and generally an absence of sulfur-contain- ing amino acids except for the initiator methionine. These proline-rich proteins display unusual repeating peptide sequences of 14-19 amino acids. The derived amino acid sequence of the cDNA insert of plasmid pMPl from mouse has a 19-amino acid sequence which is repeated four times. The inserts of plasmids pUMP40 and pUMP4 also from mouse encode for 12 and 11 repeats of a 14-amino acid peptide, respec- tively. These repetitive sequences, and others from rat and mouse cDNAs and from human genomic clones, all show very high homologies and likely evolved from duplication of internal portions of an ancestral gene. Gene conversion could account for the high degree of conservation of nucleotide sequences of the repeat re- gions. Protein derived from the nucleotide sequences are all characterized by four general regions: a puta- tive signal peptide, a transition region, the repetitive region, and a carboxyl-terminal region. The 5”flank- ing sequences and sequences encoding the putative sig- nal peptides are highly conserved (>94%) in all six cDNAs. This sequence conservation may be important in the regulation of the biosynthesis of these unusual proteins.

Repeated administration of the @-agonist isoproterenol

*This research was supported in part by United States Public Health Service Grants AM-19175 and HL-25225 and a Purdue Uni- versity David Ross fellowship (S. C.). This is Journal Paper 10346 from the Purdue University Agricultural Experiment Station. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

$ Present address: Department of Biochemistry, The University of Texas, Health Science Center a t Dallas, 5323 Harry Hines Boulevard, Dallas, TX 75235.

J To whom reprint requests should be addressed.

causes hypertrophy and hyperplasia of rat (1) and mouse (2) parotid and submandibular glands. The morphological changes are accompanied by a dramatic increase, or induction, in the synthesis of several unusual proteins which are very high in proline, or proline-rich proteins (3-5). Typically these proteins contain 25-45% proline, 18-22% glutamine, and 18- 22% glycine (6). Aromatic and sulfur-containing amino acids are either very low or absent. Generally, the proline-rich proteins can be divided into an acidic group containing 9- 11% aspartate and at times phosphate and a basic group containing 7-10% lysine plus arginine. Both groups may be glycosylated. Proline-rich proteins in the parotid glands of both mice (2) and rats (7) are also induced by diets high in tannins. After treatment with isoproterenol, or feeding diets high in tannins, proline-rich proteins can account for over 50% of the total soluble proteins in extracts of rat parotid glands. As expected, this increase in protein synthesis results from very dramatic changes in levels of mRNAs in the glands.’ Cell-free translations of RNAs from isoproterenol-treated rats (8) and from rats fed high tannin sorghum (7) have shown that proline-rich proteins are the major translation products.

Recently, Ziemer et al. (9) described the preparation and characterization of four rat proline-rich protein cDNA clones and the nucleotide sequence of one of these cDNAs in plasmid pRP33. In this study, we report the nucleotide sequence of another rat proline-rich protein cDNA in plasmid pRP25 and the synthesis and nucleotide sequences of four mouse proline- rich protein cDNAs. Nucleotide and amino acid sequences are compared for homologies. The highly conserved 5‘-flanking regions plus the nucleotide sequences encoding the putative signal peptide regions (>94% homology in six different cDNAs) could possibly be integral to the regulatory mechan- ims of protein synthesis. Gene conversion is postulated as the mechanism for the strong conservation of nucleotide se- quences in the repeat regions.

EXPERIMENTAL PROCEDURES~

RESULTS

Parotid poly(A+) RNA from isoproterenol-treated mice was enriched for proline-rich protein mRNAs by fractionation on

Proline-rich protein mRNAs comprise about 50% of the total amount of mRNAs in parotid glands of isoproterenol-treated rats as determined by dot-blot analyses (S. Clements and D. M. Carlson, unpublished observations).

Portions of this paper (including “Experimental Procedures” and Figs. 1, 3-6, and 8) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are available from the Journal of Biolog- ical Chemistry, 9650 Rockville Pike, Bethesda, MD 20814. Request Document No. 85M-771, cite the authors, and include a check or money order for $4.00 per set of photocopies. Full size photocopies are also included in the microfilm edition of the Journal that is available from Waverly Press.

13471

Page 2: Novel Multigene Families Encoding Highly Repetitive Peptide

13472 Regulation of Gene Expression in Salivary Glands low-melting agarose gel, and RNAs ranging from 500 to 1400 bases were used for cDNA synthesis. A slightly different approach from the normal procedures was taken in the syn- thesis of mouse proline-rich protein cDNA because of prob- lems encountered in the synthesis of rat proline-rich protein cDNAs. Earlier studies (9) revealed that when the mRNA strand was removed prior to second strand synthesis, the resulting single-stranded cDNA proved to be a poor substrate for second strand synthesis. Mouse cDNA was synthesized by adding dC tails to the cDNA-mRNA heteroduplex, and the dC-tailed duplex was annealed to the PstI-linearized dG-tailed vector DNA. The inefficiency of tailing RNA molecules with terminal transferase (27) was compensated by using a poly(A+) RNA preparation enriched for proline-rich protein mRNAs. Synthesis of the second strand was obtained by using Escherichia coli RNase H, DNA polymerase, and E. coli DNA ligase. The RNase H ensured a "nicked" RNA strand as a substrate for the "nick translation'' activity of DNA polym- erase, and E. coli ligase was chosen to connect the second strand DNA fragments because of its inability to ligate RNA to DNA (28). A total of 42 mouse cDNAs were obtained which hybridized well to either pRP33 or pRP25. The proline-rich protein cDNAs studied in more detail are presented in Table I. Each cDNA studied is unique as judged by restriction enzyme mapping (Fig. 1).

Proline-rich Protein cDNA Homologies-Homology rela- tionships of proline-rich protein cDNAs from rat and mouse were determined by dot-blot hybridizations (Fig. 2). Mouse and rat cDNAs were fixed to nitrocellulose filters, and hy- bridizations were carried out at increasing stringencies with 32P-labeled plasmids pMP1, pUMP12, and pRP33 as probes.

The hybridization conditions used in these experiments are approximately T, minus 30 "C (30 "C, low stringency), T, minus 20 "C (40 "C, moderate stringency), and T, minus 10 "C (50 "C, high stringency). The cDNA inserts of pMPl and pRP33 have G + C contents of 56-60% based on their nucleotide sequences. The T, values were calculated from the equation of Casey and Davidson (29). At 30 "C, each of the 32P-labeled cDNAs hybridized to all the mouse and rat cDNAs (blots for PUMP12 and pRP33 not shown). Differences begin to appear when the hybridization temperature is increased 10 "C. The mouse probe, pMP1, still annealed to other mouse cDNAs and to pRP25, but the cross-hybrids pMP1-pRP18 and pMP1-pRP33 were lost. At 50 "C, all cross-hybrids were lost except pMP1-pRP25. The nucleic acid sequences of these two cDNAs confirmed this homology (data presented with sequence analysis). pUMP12 hybridized well with pUMP39 and to a lesser extent with pRP33 and pMPl at 40 "C. At

TABLE I Plasmids containing proline-rich protein cDNA inserts

Plasmid Species cDNA size base pairs

pRP18" Rat 790 pRP25" Rat 530 pRP33" Rat 667 pMPl Mouse 650 ~ u M P 1 2 5 ~ Mouse 1017 pUMP40 Mouse 717 pUMP4 Mouse 671 pUMP39 Mouse 950 pUMP42 Mouse 725

a The preparation and identification of these cDNAs were reported by Ziemer et al. (9).

The designation U indicates recombinant plasmids derived from pUC8; other plasmids were derived from pBR322. pUMP125 is a combination of pUMP12 and pUMP225.

pRP33 pUMP12 pMPl

50"

10"

30"

FIG. 2. Hybridization analysis of rat and mouse cDNAs. Plasmids described in Table I were dot-blotted onto nitrocellulose filters as indicated by the code diagram (inset). The blots were probed with 32P-labeled cDNA inserts from pRP33, pMP1, or pUMP12 at the three temperatures indicated. Hybridization, washing, and auto- radiography were as described under "Experimental Procedures."

50 "C, all hybridization was eliminated except self-hybridiza- tion and with pUMP39. Clearly, these two mouse cDNAs are closely related. When pRP33 is used as the probe at moderate stringency (40 "C), it hybridized well with pRP18, pUMP12, and pUMP39, but an increase in temperature to 50 "C elimi- nated all hybridization except to itself. The results from the dot-blot analysis revealed that pUMP40 and pUMP4 were not closely related to any of the probes (Fig. 2). Preliminary amino acid sequence analysis of GP66sm showed substantial differences in homologies with proteins encoded by pRP33, pRP25, and pMP1. These data were suggestive that pUMP40 or pUMP4 may encode glycoprotein GP66, which is highly induced in mouse salivary glands either by isoproterenol treatment or by feeding tannins (2). Nucleotide sequence analysis supported this relationship.

Sequences of Proline-rich Protein-The complete nucleo- tide sequences of the cDNA inserts of plasmids pMP1, pRP25, pUMP125, pUMP40, and pUMP4 are presented in Figs. 3-6. The cDNA insert of pMPl (Fig. 3) contains the complete coding sequence for a putative 15-kDa proline-rich protein with 30 nucleotides of 5"noncoding sequence and 112 nucleo- tides of 3'-noncoding sequence, plus a poly(A) tail of 70 nucleotides. There is a consensus poly(A) addition site, AATAAAA, 13 nucleotides from the poly(A) tail. The cDNA insert of pRP25 (Fig. 4) contains only a partial coding se- quence for a rat proline-rich protein with 15 nucleotides 5' to the initiator methionine codon and 518 nucleotides in the coding region. The remaining 3' sequence was not transcribed, and the molecular weight of the protein encoded by this mRNA is unknown. From the dot-blot analysis (Fig. 2), nucleotide sequences of the cDNA inserts of pMPl and pRP25 were quite homologous, and these two cDNAs show 94% homology when the available nucleotide sequences were com- pared.

The cDNA inserts of pUMP40 (Fig. 5) and pUMP4 (Fig.

Page 3: Novel Multigene Families Encoding Highly Repetitive Peptide

Regulation of Gene EXPI

6) contain only partial coding sequences for proline-rich pro- teins. Amino acid sequences obtained from pUMP40 and pUMP4 are shown in Fig. 7. Amino acid compositions of GP66p and GP66sm (2) are essentially the same as those deduced from the nucleotide sequences of pUMP40 and pUMP4. Amino acid sequence analysis of peptides and gly- copeptides produced by clostripain and Pronase digestions, respectively, are shown in Fig. 7. The peptide and glycopeptide fractions were obtained after protease digestions, and these were separated on Sephadex G-50 and by high pressure liquid chromatography (2). The sequence analyses of the peptides and glycopeptides correspond with the repeat region of the amino acid sequence deduced from both pUMP40 and pUMP4. The peptide chain of GP66p (and GP66sm) may be

PUMP40

‘ H L V V L F T V A L L A L S

2 g ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ 1 5 S A Q Q P R E E N Q N Q I Q

! f R P P V N 6 S Q Q 6 >L rc P P P P G 6 P Q P R P P Q G “ P P P P 6 G P Q P R . P P Q G 8 o P P P P G C P Q P R P P Q G y 4 P P P P 6 6 P Q @ R P P Q G

1 0 8 P P P P 6 6 P Q P R P P Q G 1 2 2 P P P P 6 6 P Q P R P P Q G 1 3 6 P P P P G 6 P Q @ R P P Q 6 1 5 0 P P P P @ G P Q P R P P Q G 1 6 4 P P P P @ 6 P Q P R P P Q G ii!P P P a - 6 P Q P R P O Q S

P P P P G 6 P Q a R P P Q G *”P P P P @ 6 P Q P PUHP4

2 0 5 ~ ,P, ,P a . S -6- ,P ,Q-@,R, P P Q 6

P R P P Q G 7 @ P P P G 6 P Q P R P P Q 6

2 1 P P P P 6 6 P Q @ R P P Q 6 3 5 P P P P G 6 P Q P @ P P Q 6 4 9 P P P P @ 6 P Q P R P P Q G 63P P P @ G G P Q @ H @ P Q @ 7 7 P P P P G 6 P Q P R P P Q 6

P P P @ G P S P R P O Q G l o 5 P P P a - G P Q P R P @ Q G ‘19P P P @ G 6 P Q @ R P P Q G 1 3 3 P P P P 6 6 P Q P R P P Q G 147P P P P @ @ P Q P R P D Q G 1 6 1 P H P T 6 G P Q Q T P P L A 189 1 7 5 6 N P P 6 P P Q 6 R P Q G P

pUMP40 and pUMP4 cDNA inserts and from sequencing FIG. 7. Comparisons of amino acid sequences derived from

GP66sm. Variations in the repeat regions are indicated by circles. Alignment of repeats in three instances are facilitated by dashes. The amino acid sequences obtained from peptides and glycopeptides pro- duced by clostripain treatment are indicated by the solid underline; sequences obtained by Pronase treatment are shown by the dashed underline. The threonines in the glycopeptides are presumed to be glycosylated since these residues were not detected by sequencing. The Thr:GalNAc is 1.0 (2).

pession in Salivary Glands 13473

encoded by either pUMP40 or pUMP4, or both, but as yet we do not know whether one or two genes are represented. There is a 5’-noncoding region of 33 nucleotides and a 3”noncoding extension of 93 nucleotides on pUMP4 which does not contain a poly(A) addition site.

The nucleotide sequence of the cDNA insert of pUMP125 is shown in Fig. 8. This sequence codes for a proline-rich protein of 31 kDa and includes three nucleotides 5‘ to the initiator methionine codon, 903 nucleotides of coding se- quence, 96 nucleotides of 3’-noncoding sequence, and nine bases of a poly(A) tail. A consensus poly(A) addition site occurs 12 nucleotides 5’ to the poly(A) tail. The last 132 nucleotides of the pUMP125 sequence were derived from a cDNA insert of plasmid pUMP225, with the rest of the pUMP125 sequence coming from the cDNA of another plas- mid, pUMP12. As indicated, there were 90 bases of exact overlap between the 3’ end of the pUMP12 and the 5’ end of pUMP225 (Fig. 8). Partial sequencing of the cDNA insert of pUMP39 showed that this cDNA is highly homologous to pUMP125 with single base differences between pUMP39 and pUMP125 throughout the regions of pUMP39 sequenced (data not shown). The portion of pUMP39 sequenced shows 90% homology between pUMP39 and the respective region of pUMP125.

Nucleotide Homology at the 5’ End of the Proline-rich Protein cDNAs-The first 70 nucleotides in the coding re- gions of the proline-rich protein cDNAs, containing mainly the nucleotides encoding the putative signal peptide (Fig. 71, and available sequences of the 5-noncoding regions show an unusually strong homology (>94%) for six different cDNAs that encode six different proteins (Fig. 9). Presently we have not determined the extent of N-terminal processing of the primary translation products. Earlier studies clearly demon- strated that the initator Met is removed in vivo (8), but six of the rat and mouse proline-rich proteins investigated so far have blocked N termini. We can only surmize that these proline-rich proteins, which are secreted into the oral cavity, have about 15-19 amino acids removed by a signal peptidase. Of the first 19 amino acids encoded by pMPl (Fig. 3), the first 13 are hydrophobic and four of the next seven (14-20) are Ser, or Gly, which are amino acids frequently found in the region hydrolyzed by the signal peptidases (30).

The Transition Regwn-A segment termed the “transition region” is located between the highly homologous 5’ regions and the characteristic and distinctive repeat regions (Fig. 7). This transition region is highly variable in both the mouse and rat cDNAs. The transition region of the cDNA insert of pRP33 (amino acids 19-79) contains an unusually acidic sequence with 10 Asp residues within residues 58-73 (9). This acidic region contains two Asp-Gly-Asp-Asp repeats which show complete homology in nucleotide sequence. The transi- tion region of pUMP125 is similar to that of pRP33 both in sequence and size and contains 10 acidic residues within the amino acids sequences 60-80 (Fig. 8). pUMP125 may be the mouse homolog to pRP33. The length of the transition region varies from about 13 amino acids in pMPl and pRP25 to 60 in pRP33. Generally, the transition region contains almost all of the acidic amino acid residues.

The Repeat Region-pRP33 contains six repeats of 18-19 amino acids, with a high level of codon conservation (9). Generally, repeat regions in mouse and rat proline-rich pro- teins contain 14-19 amino acids. The consensus repeats of the various plasmid inserts are aligned in Fig. 10. In the 24 repeats of the 14-amino acid peptides (Fig. 7), there are 28 amino acid substitutions, and 25 of these result from single base changes. Apparently all threonines in the repeats are

Page 4: Novel Multigene Families Encoding Highly Repetitive Peptide

13474 Regulation of Gene Expression in Salivary Glands - 30 -20 -10 T G G " c C T ~ A C A ~ C T T T ~ C T ~ ' " I " C T ~ T 10 30 40 2Y

T6GTGGTCCT CACAG CCTTGC CCTGAGCT CTC

TGGTGGTCCT CACA6 CCTTGCTG TGAGCT CTC

TGGTGGTCCTGTTCACAGCG

TGGT66TCCTGTTCACAGCG

TGGTGGTCCTGT@CA@GCflTGCTGGCCCTGAGCTCTGCTCAG FIG. 9. Nucleotide homologies at the 5' end of the six rat and mouse plasmid DNAs. The initiation

codons are aligned with the A designated as base 1.

! k ? L 2 % u K s € s.m!m€E R ~ ~ . carboxyl-terminal region. The signal peptide and the repeat PRES: (cs)

pRP33: (cs)

pRP33: (cs) p p 0 G - p Q G - p p Q p G - N p Q G (18) (9) to be rat-mouse homologs. While it would be interesting to

PMP1: (cs) p p p Q G G - p Q Q K p p Q p G - K p Q G (19) protein cDNAs by the Chou and Fasman procedure (31), the

- - (19)A (B) regions were highly conserved between the rat and mouse cDNAs, while the transition and carboxyl-terminal regions

- - (19) (') showed much less conservation even between what was likely

!kEL analyze the derived protein sequences of the proline-rich

PUMP125: (cs) P P P Q G G - P Q G - P P R P G - N Q Q G (18) (B)

prnP404: (cs) P P P P G G - P Q Q - - - R P - - - P Q G (14) (B)

w 1-89 (BASIC): (PS) P P P P G K - P Q G P P P Q G G N X P Q G (20) (25)

PRC (ACIDIC): (PS) P P P P Q G K P Q G P P Q Q G G H X X X X (21) (47)

CD-I1 F (GLYCOSYLATED): (PS) P P P R P G K P G P P P Q - G G N Q S Q G (21) (48)

E PRP2 & PRP~: (GS) P P P P P G K P Q G P P P Q G G N K P Q G (21) (34)

ANUMBER OF RESIDUES IN EACH REPEAT.

BTHESE STUDIES.

FIG. 10. Comparison of repeat sequences of rat, mouse, and human proline-rich proteins. Designations indicate if the se- quence was derived from a cDNA sequence (CS), a genomic sequence (GS), or the peptide sequence (PS). The derived consensus sequences of pRP33 (9) and pUMP404 (Fig. 7 ) have been confirmed by peptide sequencing.

glycosylated (2). The repeat region generally is highly basic. The Carboxyl-terminal Region-There appears to be little,

if any, conservation or homology in the carboxyl-terminal regions of the various derived peptide sequences. In two plasmids, pUMP125 and pRP33, however, there is a cluster of aromatic amino acids and leucine in the final six or seven amino acids (pUMP125, -YLFSLFA; pRP33, -YLWFS(A)S). In PUMP1 and pUMP4, remnants of the repeat regions seem to be present in this region.

DISCUSSION

The data presented by these studies clearly show that the proline-rich proteins in mouse and rat constitute multigene families. We have prepared and sequenced several cDNAs which encode proline-rich proteins. At low stringency, all of thse cDNAs showed cross-hybridization, but as the hybridi- zation stringency was increased, relationships between these cDNAs can be discerned. The cDNA pair, pUMP125 and pRP33, are likely to be the mouse-rat homologs. Examination of their nucleotide sequences showed substantial homology throughout the transition and repeat regions, and these hy- bridization data were what would be expected for homologous members of a multigene family.

The nucleotide sequences of six proline-rich protein cDNAs have been completed. In each case, the derived amino acid sequences can be divided into four regions: the putative signal peptide, the transition region, the repeat region, and the

high levels of proline and glycine in proline-rich proteins would likely bias the interpretations (9).

The 5'-noncoding sequences and the first 70 bases of coding sequence from six rat and mouse proline-rich protein cDNAs are compared in Fig. 9. The initiation codons are aligned. Homology in this portion of these cDNAs is striking, espe- cially when considering that the cDNAs cross-hybridize only at low stringency (Fig. 2) and that the remaining regions diverge in sequence. Homology in this 5' region is as high as 99% (pRP25-PMP1) and is not lower than 70% (pRP18- pUMP125). The homology extends through the nucleotides encoding the putative signal peptide, and this conservation of sequence may have resulted from constraints of coding for hydrophobic amino acids. However, conservation of the signal peptide in other secreted proteins which are related, but different, has not been observed (30). Within the first 100 bases, there were imperfect inverted repeats that may form stem-loop structures in the mRNAs. The stem-loop structures have AG values of -6 to 8 kcal as calculated by the formula of Tinoco et al. (32), but the biological significance of these structures, if they occur in vivo, is not known. A sequence of about 50 nucleotides surrounding the translation initiation codon for a1 type I and a1 type I11 collagen mRNAs is also highly conserved, and the translational control of collagen synthesis by these remarkably conserved sequences was pro- posed (33). The most thermodynamically favored configura- tions for pMPl and pUMP40 showed the initiation codon in the loop portion, whereas in the collagen mRNAs two AUGs were directly involved in each inverted sequence hybridization as part of the stem portion.

While the transition region showed little sequence conser- vation (nucleotide or derived amino acid sequence) between proline-rich protein cDNAs, the repeat regions of proline-rich proteins, derived both from the nucleotide sequences of cDNAs and a human genomic clone and from peptide se- quencing, showed the close relationships of these unusual proteins. The human genomic sequence was established on subclones screened by the rat proline-rich protein cDNA, pRP33 (34). Frequent polymorphisms in EcoRI digests were observed with human DNA typified by doublets wtih less than 1 kilobase difference between the components of each pair. Frequent length polymorphisms suggest that deletions or insertions occur readily in the proline-rich protein genes. These polymorphisms were aptly illustrated by comparing the sequences of pMPl and pRP25. Through amino acid 70, the two sequences are virtually identical, then pRP25 contains 19

Page 5: Novel Multigene Families Encoding Highly Repetitive Peptide

Regulation of Gene Expression in Salivary Glands 13475

amino acids not present in pMP1, and then sequence homol- ogy resumes. The number of repeats in PRP3 sequences appears to be variable and may be the principle reason for these polymorphisms.

The repetitive nature of the proline-rich protein mRNAs implies extensive duplication of an ancestral gene(s), similar to the collagen genes where the genes consist of many 45- 108-base pair exons, and each exon begins with a Gly codon and ends with a Gly codon (35). An apparent splice junction has been located in the human genomic sequence (33), and the possible splicing sequences of the proline-rich protein mRNAs have been discussed. From the partial sequence of the human genomic clone, the repeat sequences apparently were not each separated by introns. Recent evidence on the sequence from a mouse genomic clone shows two introns (36). Intron 1 separates the putative signa1 peptide region from entire structural region. Intron 2 is in the 3'-noncoding region. Southern blots of rat and mouse genomic DNA have indicated that there are approximately 25-30 kilobases of DNA which hybridize to 32P-labeled PRP cDNA (5). On the average, each gene (possibly 10 in the rat and seven in the mouse) would occupy only 3-5 kilobases of DNA. This would dictate that the genes are compact with few and relatively small introns such as in the silk moth chorion gene family where it is estimated that there are 230 structural genes in 900 kilobases (37). Alternatively, there may not be separate or distinct genes for each PRP mRNA. A few genes could generate several mRNAs by differential splicing. Multiple mRNAs from a single gene have been observed in the mouse amylase gene (38), the dihydrofolate reductase gene (39), and the chicken vimentin gene (40).

By utilizing 32P-labeled PRP cDNA clones, the PRP genes in mouse have been localized to mouse chromosome 8 at the proximal portion (toward the centromere) (41). These data support genetic data from humans (42) that PRP genes are clustered rather than being scattered throughout the genome. This clustering may be important to the regulatory aspects of this highly polymorphic PRP family.

Northern blots show that, with possibly one exception, proline-rich protein mRNAs are dramatically increased by isoproterenol treatment and that the increase of proline-rich proteins is likely regulated principally by the elevation of these mRNAs (2). By analogy with other systems, such as vitellogenin (43), ovalbumin (44), transferrin (45), and lactate dehydrogenase (46) syntheses, the increase in protein synthe- sis may also be regulated by the stability of the mRNAs or by translational control factors. The unusual conservation of nucleotide sequence surrounding the AUG initiation codon may contribute to specific regulatory functions, such as pro- posed for the collagen mRNAs (32).

Acknowledgments-We thank Dr. Mark Hermodson and Scott Buckel for performing the amino acid sequencing and Boklye Choi for preparing the glycopeptides.

REFERENCES 1. Barka, T. (1965) Exp. Cell Res. 37,662-679 2. Mehansho, H., Clements, S., Sheares, B. T., Smith, S., and

3. Fernandez-Sorenson, A., and Carlson, D. M. (1974) Biochem.

4. Muenzer, J., Bildstein, C., Gleason, M., and Carlson, D. M. (1979)

5. Mehansho, H., and CarIson, D. M. (1983) J. Biol. Chem. 258,

3The abbreviations used are: PRP, proline-rich protein; TE, 10 mM Tris-HC1 (pH 7.5) containing 1 mM EDTA; 1 X SSC, 0.15 M NaCI, 0.015 M sodium citrate (pH 7.0); SDS, sodium dodecyl sulfate.

Carlson, D. M. (1985) J. Biol. Chem. 260,4418-4423

Biophys. Res. Commun. 6 0 , 249-256

J. Biol. Chem. 254,5623-5628

6616-6620 6. Muenzer, J., Bildstein, C., Gleason, M., and Carlson, D. M. (1979)

J. Biol. Chem. 254,5629-5634 7. Mehansho, H., Hagerman, A., Clements, S., Butler, L., Rogler,

J., and Carlson, D. M. (1983) Proc. Natl. Acud. Sci. U. S. A.

8. Ziemer, M. A., Mason, A., and Carlson, D. M. (1982) J. Biol. Chem. 257,11176-11180

9. Ziemer, M. A., Swain, W. F., Rutter, W . J., Clements, S., Ann, D. K., and Carlson, D. M. (1984) J. Biol. Chem. 2 5 9 , 10475- 10480

10. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., and Rutter, W . J. (1979) Biochemistry 18,5294-5299

11. Aviv, H., and Leder, P. (1972) Proc. Natl. Acud. Sci. U. S. A. 69,

12. Birnboim, H. C., and Daly, J. (1979) Nucleic Acids Res. 7,1513-

13. Bailey, J. M., and Davidson, N. (1976) Anal. Biochem. 70, 75-85 14. Gordon, J. I., Bums, A. T. H., Christman, J. L., and Deely, R. G.

15. Okayama, H., and Berg, P. (1982) Mol. Cell. Biol. 2, 161-170 16. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1972) in Molecular

Cbning: A Laboratory Manual, p. 239, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

17. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) in Molecular Cloning: A Laboratory Manual, p. 250, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

18. Grunstein, M., and Hogness, D. (1975) Proc. Natl. Acud. Sci. U. S. A. 72,3961-3965

19. Kafatos, F. C., Jones, C. W., and Efstratiadis, A. (1979) Nucleic Acid Res. 7,1541-1552

20. Maniatis, T., Jeffery, A., and Kleid, D. G. (1975) Proc. NatL Acad. Sci. U. S. A. 72, 1184-1188

21. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) in Molecular Cloning: A Lr@ratory Manual, p. 122, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

22. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) in Molecular Cloning: A Laboratory Manual, p. 148, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

23. Maxam, A., and Gilbert, W . (1980) Methods Enzymol. 65,499- 560

24. Larson, R., and Messing, J. (1982) Nucleic Acids Res. 10,39-49 25. Kauffmann, D., Wong, R., Bennick, A., and Keller, P. (1982)

Bwchemistm 2 1 , 6558-6562 26. Hermodson, M., Schmer, G., and Kurachi, K. (1977) J. Biol.

Chem. 252,6216-6279 27. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) in Motecular

Cloning: A Laboratory Manual, p. 221, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

28. Higgins, N. P., and Cozarelli, N. R. (1979) Methods Enzymol. 6 8 ,

29. Casey, J., and Davidson, N. (1977) Nucleic Acids Res. 4, 1539-

30. Kriel, G. (1981) Annu. Reu. Biochem. 50,317-348 31. Chou, P. Y., and Fasman, G. D. (1974) Biochemistry 13,222-245 32. Tinoco, I., Uhlenbeck, 0. C., and Levine, M. D. (1971) Nature

80,3948-3952

1408-1412

1523

(1978) J. Biol. Chem. 253,8629-8639

50-71

1552

33.

34.

35.

36.

37. 38.

39.

40.

41.

42.

230,362-367 . .

Yamada, Y., Mudryj, M., and decombrugghe, B. (1983) J. Bwl. Chem. 258,14914-14919

Azen, E., Lyons, K. M., McGonigal, T., Barrett, N. L., Clements, L. s., Maeda, N., Vanin, E. F., Carlson, D. M., and Smithies, 0. (1984) Proc. Natl. Acad. Sei. U. S. A. 81,5561-5565

Yamada, Y., Awedimento,V. E., Mudryj,M., Ohkubo, H., Vogeli, G., Irani, M., Pastan, I., and decombrugghe, B. (1980) Cell 22 ,

Ann, D. K., and Carlson, D. M. (1985) J. Biol. Chem. 2 6 0 , in

Eickbush, T. H., and Kafatos, F. C. (1982) Cell 29,633-639 Young, R. A., Hagenbuchle, O., and Schibler, 0. (1981) Cell 2 3 ,

Setzer, D. R., McGrogan, M., Nunberg, J. H., and Shimke, R. T.

Capetanaki, Y. G., Ngai, J., Flytaznes, C. N., and Lazarides, E.

Azen, E., Carlson, D. M., Clements, S., Lolly, P. A., and Vanin,

Azen, E., and Yu, P. L. (1984) Biochem. Genet. 22 ,1 -4

887-892

press

451-458

(1980) Cell 22,361-370

(1983) Cell 35,411-418

E. (1984) Science 226,967-969

Page 6: Novel Multigene Families Encoding Highly Repetitive Peptide

13476 Regulation of Gene Expression in Salivary Glands 43. Baker, H. J., and Shapiro, D. J. (1978) J. Bwl. Chem. 253,4521- 46. Miles, M. F., Hung, P., and Jungmann, R. A. (1981) J. Biol.

44. Tsai, S. Y., Roop, D. R., Tsai, M.-J., Stein, J. P., Means, A. R., 47. Wong, R. S. C., and Bennick, A. (1980) J. Bwl. Chem. 255,5943-

45. Lee, D. C., McKnight, G. S., and Palmiter, R. D. (1978) J. Bwl. 48. Shimomura, H., Kanai, Y., and Sanada, K. (1983) J. Biochem.

4524 Chem. 256,12545-12552

and O'Malley, B. W. (1978) Biochemistry 17, 5773-5780 5948

Chem. 253,3494-3503 (Tokyo) 93,857-863

Supplementary M t e r i a l t o

Novel Multi-Gene Families Encoding Highly Repetitive Peptide Sequences:

Sequence Analyses of Rat and Mouse Proline-rich Protein cDNk

Scott Clmnts*, Haile Hehansho, and Don M. CarIson-*

EXPERIMENTAL PROCEOURES

Materials - The following substances were purchased fm the respective conpanies:

Bethesda Research Laboratories Inc ' r e s t r i d i o n enzyms E c o l i OW. polymerase. Klen& r e s t r m 2 y n e s . terminal deoxynucleotidyl transferase, RNase H and oligo-d(T)cellulose

f r a m n t and E c o l i DIM 1ig;se. & England Biolabs: a ; i ~ m o b l a s t o r i s v i r u s reverne

kinase Boehringer-hnnheim- 'C3Hl" l i n e [ % M h i o n i n e &PldCTP and [%]dCTP ei ther transcrihase. iit%-%iences InC . cal f in test ine a lka l ine hosphatase and polynucleotide

h r s h h or Ner England Nucfaar; ~ P ~ P l d e , ICN. Other &r ia ls were purchased fm' c m r - cia1 sources and uere of the highest purlty available.

Muenzer et al. (4) and by Hehansho et al. (2). respectively. Treatmnt was f o r 10 days unless Isorroterenol treatment - Rats and mice were treated with isoproterenol as decribed by

othetnise indicated.

Prwarat ion of RNA and Plasmid DIU - RWI was isolated by the guanidine thiocyanate-cesim chloride procedure of Chirgain e t al. (10). Poly(A+) RNA mas prepared by chhnratography on oligo-(dT) cellulose as described by Aviv and Leder (11). Plasmid OW. was first puri f ied by the alkal ine lys is metnethod (12) and fur ther pur i f ied on cesim chloride gradients.

Enrichment for proline-rich Protein Poly(A+) R I M - Poly(A*) RW. (20 pg) was isolated frm muse parotid glands and was subjected t o electrophoresis on a 1.251 la -mel t ing po in t

were cut, mmved, and soaked i n 100 nW di th io thre i to l for 30 min. The lanes were then cut agarose gel containing 5 nll w h y 1 mercury hydroxide (13). After electrophoresis the lanes

i n t o 3 m sl ices and each s l i c e placed i n s t e r i l e Eppendorf tubes. F ive ge l -vo lws of ster-

t i o n was extracted 3 times with phenol. cholorofom: iswnprl alcohol (5048:Z vlv). and once i l e 0.5 M m n i m acetate were added t o each rUbe and the agarose melted a t 65T. The solu-

w i th chloroform. After ethanol precipitation, thc RW. was mashed twice with.702 ethanol and dried. The R M was dissolved i n 10 rl o f s t e r i l e water and the appropriate RIM fractions were pool+ f o r cMA synthesis. These fractions were also identi f ied as containing proline-rich proteln MNAS by cel l- free translat ion experiments as described prev iws ly (9).

Synthesis of pro l ine-r ich prote in dlk - cD)(As were synthesized as described previous1y (14). Reaction mixtures were extracted with an equal v o l m o f phenol: chloroforn: i s w l alcohol (50:48:2. vlv). The cOIM was preci i tated by the addition of an equal volune of 4 M

i c e f o r 15 min. The samples were a r m d t o rom temperature t o dissolve unincorporated deow- a m n i m acetate and 4 v o l e s of ethanol !relative to o r i g ina l volume). and ch i l led on dry

nucleotide triphosphates. and the COIL4 was p l l e t e d by centrifugation. The pe l le t was dis- solved i n TE (pH 7.5). and precipitated with ethanol a second time. This procedure peeves 9% of the unreacted deoxynucleotide triphosphates (15).

hybrids i n a reaction mixture containing 1M 111 potassium cacdylate. 1 M CoC12, 0.1 nl( k p o l m r Ta i l ing and Annealing o f CWA - k p o l j m e r t a i l s e r e added t o cDWA:n7M

d i th io th re i to l . 1 nll dCTP, 20 uelml 01190 d(T),,.. and 500 unitslml teminal deow-

37-C for 30 min and then extracted with phenollchloroforn. The products were precipitated nucleotidyl transferase i n a t o t i l reaction ~&li&~>~'iO ul. The mixtin was incubated i t

wi th ethanol. h o l m r t a i l s of d6 were added t o the Pst l s i tes o f oE8322 and dY.8 as described previwsly 116). The &-tai led cDIM-.RIM hybrids (0.2 ug) 'were annedled t o dG-tailed vector DNA (0.6 pg) by mixing the two i n 10 p l of a solution consisting of 100 M NaCl, 10 M Tris-HC1 (pH 7.5). and 1 n)l EDTA. The solut ion was incubated at 65-C f o r 2 min. UT for 2 h. and then placed on ice.

synthesized by using RNase H, DW. a l p e r a s e I , and L. COH DNA l igase (15). The solut ion of Synthesis of cDNA Second Strand - The M A strand was replaced and the second cDNA strand

annealed cDM-mRNA vector was adjusted t o contain M M Tris-HC1 (pH 7.5). 5 nll MgC12, 10 M 8-mercaptOethanol 40 JI dNTPs 0 15 3r n-nicotinamide-adenine dinucleotide 2 units E c o l i RNase H, 10 u n i t s k . c o l i DNA fi&, and 25 units E. c o l i O I M p o l m r a s e I ' i n a finaf;oliiiZ of 1QO "1. The mixtuF6"vwas incubated f o r 1 h a t 1bCX then 1 h at roa temperature. The solut ion was then d i l u ted t o 600 ul with ice-cold TE (pH 7.5).

Transfomation and Colony Selection - E. strains HBlOl ( for NU322 plasmids) and T01

hundred microl i ters f m the di luted second strand svnthesis reaction mixture was used. ( for PUCE plasmids) were calcium-shocked at i i t ransfomd by published procedures (17). One-

T r a n s f o d HBlOl cells were plated on L-agar containindl5 i i Im1 tetracycl ine. and TBi tians- f o m n t s were plated m L-agar containing 150 rglml a i p i c i i i i n and 40-ugIml~X- al. L-Agar contained 5 g l l NaC1. 5 g l l yeast extract. 10 g l l tryptone, and 14 g l l agar. qetracycline- res is tan t lap ic i l l i n -sens i t i ve HBlOl colonies and awicillin-resistantln-qalactosidase- n at ive mi colonies E r e screened by the procedure' of Grunstein and bgness (18). 3%?-Labeled cDNA inserts fm pRP25 and pRP33 were used as probes.

l o s e ~ l t e r s using the procedure of Kafatos (19). with a dot-blot manifold (Bethesda Research OM-Blot h l v s i s of CONAS - Planids l inearized with Hind 111 were f i xed to n i t roce l lu -

f o r 4 h i n 701 deionized fommide, 2.0 X SSC, 0.1% SDS, 100 g l n l sonicated, heat denatured Laboratories). Af ter baking t h e f i l t e r s a t 8O'C BE for 2 h. the f i l t e r s were incubated

salmon sperm D M . 0.1% bovine serum albumin, 0.1% polyvinylpyrrolidone and 0 1 Fico l l for 4 h a t 50T. The solution was replaced with the same solut ion which ~ontained.[~~PlcONA inserts labeled by n i t - t r a n s l a t i o n (20). The f i l t e r s were incubated a t 30'. Woo, or 50' f o r 12-36 h and were then washed i n 1 X SSClO.15 SDS 3 ti- a t rw. temperature and then 3 times i n 0.1 X SSCIO.11 SDS a t YI%. Each wash was f o r 15 min. The f i l t e r s were blot ted dw. WaDDed i n cellophane. and exposed to K&k ull(-5 f i l a with a Cronex i n t i s t i f y i n g - s i r e i n ~ ~ 8 0 n C ' + & 6-18 h,

using%%idnd polynucleotide kin@' (21). The 3' ends were labeled using t e n i n a l uencin - Dephosphorylated Mu r e s t r i c t i m f r a w n t s were labeled on the 5' ends

deo nucleotidyl transferase and e[ lldideowadenosine triphosphate (22) or with .-tsPMNTPs and K l e m fragnent. OWA sequencing w s drme by the cha ica l method i f Maxm and Gi lbert (23).

micmccwster or an Apple I l e microcaquter using the DNA sequence analysis p r c q r m described Caputer Sequence Analysis - All mmputer seqwnce analyses were p e r f o d on a Basis 108

by Larson and Messing (24).

GP66sa. and preparation of peptider and glycopeptides by Pronase digestion has been described Preparation of Glycopeptides - The pur i f icat ion of proline-rich glycoproteins. GP66p and

(2). Clostr ipain treatuent of G P S P and GP66pn was carr ied out as described by Kauffman e t al. (25). The clostripain digest was subjected t o Sephadex 6-50 chmtography and the glycopeptide and peptide peaks were pooled separately.

Henodson et al. (26). Peetide Sequence h a l y s i s - Glycopeptides and peptides were sequenced according t o

I .

1 Fig. 1. Restr ict ion Maps and Sequencing Strat- for cDWI Inserts. Only selected

rest r ic t ion s i tes used f o r sequencing are s h m . Horizontal a r m indicate the direct ion of sequencing and the length of sequence obtained.

?" 10

sequence 1s n u m b e r e d beginning with the ini t iator methionine. Fig.. 3. c n p l e t e Nucleotide sequence of PPl cDNA Insert. The derived m ino ac id

Page 7: Novel Multigene Families Encoding Highly Repetitive Peptide

Regulation of Gene Expression in Salivary Glands 13477 1 10

Gly Fm Fm Fm Gln C1 Cl Pm Gln GZn Ser E x leu Pm GZ Lys Fm Gln A CGC CCA CCC CCA C M GGX d CCA C M WK: AGC CCC CTG Cffi CCT Gd M C CCC CM UT 225

80 90

CCA CCC CCA CM d AGC CCA CAG CM: *ih CCA CLT UG CCT d $tc CCC C M Gd CCA 285 pro Fm Fm ctn G I ser Fm czn GLn L 8 pro Fm G h pro G t L 8 F m an GZ Fm

160 170

E.3 z u533

sequence i s n d e r e d beginninp rith the Ini t iator methionine. Fig. 4. Cnplete Nucleotide Sequence o f ppzs c D I Insert. The derived anin0 acfd

Pm nn CCA CAC

A a F m 220

cerm.4 pn? A l a Cly CCA Geb Q%

Pro ctn ceacy:

Pro 211

'&714

sequence i s n d r e d beginning with the fni t iator pthionfn. Fig. 5. Cmplete Nucleotide Sequence of pWPM c D M Insert. The derived mfno acid

Fm A q Pro Fm Gln m y an PFO Fm Fm m y Gly Fm Gln PM Arg Pm Gm Gln Gly 10 20

COG A U CCC CCT C M UX: CM CCA CCA CCA c66 CCC CCA UG CCD Apd CCC Ccr C M CGC 60

Fm Fm Pm Fm m y m y Pm Gin a n A r g Fm P m Cln m y Pn, Pro R o Fm Gly 30 40

CCA Ccll CUL CCA CU CGC CCA c*c UG ACA CCC C R CM CGC CCA CCA CCA CCA C U GGC 120

sequense i s nlnbered begfnnlng with the f f n t Pro. Fig. 6. Cclplete ~ l e o t f d e Sequence of Pup4 CUM Insert. me d e r l v e d l t n o acfd

T C M T A T C A ~ ~ A T A ~ T U I ~ G M T ~ I A T I T A G C C ~ ~ l ~ 1 4

Fig. 8. CaOlete I(ucleotide Sequence of puIp125 as compfled f m the sequences of p w 1 2 and WMP225 (see Fig. 1). The cnpletely hwlogous overlapping regfon i s underlined. The derfved anfm acfd sequence i s n d e d beginning w i t h the ini t iator methionine.