structure and expression of the gene encoding cystatin d, a novel

6
THE JOURNAL Q 1991 by The American Society for Biochemistry and Molecular Biology, Inc. OF BIOLOGICAL CHEMISTRY Vol. 266, No. 30, Issue of October 25, pp. 20538-20543,1991 Printed in U. S. A. Structure and Expression of the Gene Encoding Cystatin D, a Novel Human Cysteine Proteinase Inhibitor* (Received for publication, May 5, 1991) Jose P. FreijeS, Magnus AbrahamsonQ, Isleifur OlafssonQ, Gloria Velasco, Anders GrubbQ, and Carlos Lopez-Otin!l From the Departamento de Bwlogin Funcionul, Universidad de Ouiedo, 33071 Oviedo, Spain and the §Department of Clinical Chemistry, University of Lund, University Hospital, S-22185 Lurid, Sweden A new member of the human cystatin multigene fam- ily has been cloned from a genomic library using a cystatin C cDNA probe. The complete nucleotide se- quence of a 4.3-kilobase DNA segment, containing a complete gene with structure very similar to those of known Family 2 cystatin genes, was determined. The novel gene, called CST4, is composed of three exons and two introns. It contains the coding information for a protein of 142 amino acid residues, which has been tentatively called cystatin D. The deduced amino acid sequence includes a putative signal peptide and pre- sents 5146% identical residues with the sequences of either cystatin C or the secretory gland cystatins S, SN, or SA. The cystatin D sequence contains all regions of relevance for cysteine proteinase inhibitory activity and also the 4 cysteine residues that form disulfide bridges in the other members of cystatin Family 2. Northern blot analysis revealed that the cystatin D gene is expressed in parotid gland but not in seminal vesicle, prostate, epididymis, testis, ovary, placenta, thyroid, gastric corpus, small intestine, liver, or gall- bladder tissue. This tissue-restricted expression is in marked contrast with the wider distribution of all the other Family 2 cystatins, since cystatin C is expressed in all these tissues and the secretory gland cystatins are present in saliva, seminal plasma, and tears. Cys- tatin D, being the first described member of a third subfamily within the cystatin Family 2, thus appears to have a distinct function in the body in contrast to other cystatins. The cystatins are a group of proteins, present in a variety of human tissues and body fluids, that have the ability to inhibit cysteine proteinases. They should therefore have an * This work was supported by grants from Comisi6n Interminis- terialpara el Desarrollo de la Investigacibn Cientifica y Tbcnica (Spain), Kungliga Fysiografiska Sallskapet i Lund, Direktor A. Pihls- sons Stiftelse, A. 6sterlunds Stiftelse, G. och J. Kocks Stiftelse, T. och E. Segerfalks Stiftelse, T. Nilson’s Foundation, the Swedish Society of Medicine, the Medical Faculty of the University of Lund, the Swedish Medical Research Council (Projects 05196, 09291, and 09915), Heilavernd Society, and the B. Magnusdottir and J. J. Bjar- nason Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solelyto indicate this fact. The nucleotide sequence(s)reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession numbeds) x59964. 4 Recipient of a fellowship from the Ministerio de Educaci6n Y Ciencia, Spain. 7l To whom correspondence should be addressed. Tel.: 34 85 10 35 63; Fax: 34 85 23 22 55. important regulatory role not only in physiological processes in which cysteine proteinases are involved, e.g. bone resorp- tion (Delaissb et al., 1984), but also in the control of pato- physiological conditions such as sepsis (Assfalg-Machleidt et dl., 1988), metastasizing cancer (Pietras et al., 1978; Koppel et al., 1984), and the local inflammatory processes in rheu- matoid arthritis (Mort et al., 1984) and purulent bronchiec- tasis (Buttle et aL, 1990). Based on structural and functional comparisons, the cystatins constitute a single evolutionary protein superfamily that can be classified into at least three different families of closely related members (Barrett et al., 1986a; Rawlings and Barrett, 1990). Family 1 cystatins con- tain about 100 amino acid residues (Mr 11,000-12,000) and lack disulfide bridges. Family 2 cystatins are about 120 amino acids long (M, 13,000-14,000) and have two intrachain disul- fide bonds. Finally, Family 3 cystatins, kininogens, represent the most complex members of this protein superfamily and contain three cystatin-like domains. Each of these three do- mains has two disulfide bonds at positions homologous to those in Family 2 cystatins (Muller-Ester1 et al., 1986). At present four human Family 2 cystatins have been iso- lated and characterized. The first of these inhibitors to be sequenced was cystatin C, originally isolated from urine (Grubb and Lofberg, 1982) but widely distributed in tissues and biological fluids (Abrahamson et al., 1986). This inhibitor is synthesized as a preprotein with a signal peptide, indicating that the functions of the protein are mainly extracellular (Abrahamson et al., 1987a). Cystatin C plays an important role in the development of cerebral hemorrhage in patients affected by a hereditary form of amyloid angiopathy (Ghiso et al., 1986). In thesepatients,avariant of cystatinC is deposited as amyloid fibrils in the walls of the cerebral arteries leading to dementia, brain hemorrhage, and death in young adults (Lofberg et al., 1987; Palsdottir et al., 1988). The three other human Family 2 cystatins whose amino acid sequences have been described, namely cystatins S, SN, and SA, have been isolated from saliva and show a close relationship with about 90% identical residues in painvise comparisons (Isemura et al., 1984, 1986, 1987). All of them have also been found in other body fluids, like urine, tears, and seminal plasma, but it seems that their extracellular distribution is more restricted than that of cystatin C (Abra- hamson et al., 1986). These three “secretory gland cystatins” are composed of 121 amino acid residues, sharing about 50% identical residues with the cystatin C sequence. The similarities in protein structure between Family 2 cystatins are reflected by cross-hybridization at the DNA level. Hybridization with cystatin C cDNA to restriction endonuclease digests of genomic DNA has suggested that the Family 2 cystatins are encoded by a multigene family probably composed of seven members (genes or pseudogenes) (Abra- 20538

Upload: truongdien

Post on 13-Feb-2017

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Structure and Expression of the Gene Encoding Cystatin D, a Novel

THE JOURNAL Q 1991 by The American Society for Biochemistry and Molecular Biology, Inc.

OF BIOLOGICAL CHEMISTRY Vol. 266, No. 30, Issue of October 25, pp. 20538-20543,1991 Printed in U. S. A.

Structure and Expression of the Gene Encoding Cystatin D, a Novel Human Cysteine Proteinase Inhibitor*

(Received for publication, May 5, 1991)

Jose P. FreijeS, Magnus AbrahamsonQ, Isleifur OlafssonQ, Gloria Velasco, Anders GrubbQ, and Carlos Lopez-Otin!l From the Departamento de Bwlogin Funcionul, Universidad de Ouiedo, 33071 Oviedo, Spain and the §Department of Clinical Chemistry, University of Lund, University Hospital, S-22185 Lurid, Sweden

A new member of the human cystatin multigene fam- ily has been cloned from a genomic library using a cystatin C cDNA probe. The complete nucleotide se- quence of a 4.3-kilobase DNA segment, containing a complete gene with structure very similar to those of known Family 2 cystatin genes, was determined. The novel gene, called CST4, is composed of three exons and two introns. It contains the coding information for a protein of 142 amino acid residues, which has been tentatively called cystatin D. The deduced amino acid sequence includes a putative signal peptide and pre- sents 5 1 4 6 % identical residues with the sequences of either cystatin C or the secretory gland cystatins S , SN, or SA. The cystatin D sequence contains all regions of relevance for cysteine proteinase inhibitory activity and also the 4 cysteine residues that form disulfide bridges in the other members of cystatin Family 2. Northern blot analysis revealed that the cystatin D gene is expressed in parotid gland but not in seminal vesicle, prostate, epididymis, testis, ovary, placenta, thyroid, gastric corpus, small intestine, liver, or gall- bladder tissue. This tissue-restricted expression is in marked contrast with the wider distribution of all the other Family 2 cystatins, since cystatin C is expressed in all these tissues and the secretory gland cystatins are present in saliva, seminal plasma, and tears. Cys- tatin D, being the first described member of a third subfamily within the cystatin Family 2, thus appears to have a distinct function in the body in contrast to other cystatins.

The cystatins are a group of proteins, present in a variety of human tissues and body fluids, that have the ability to inhibit cysteine proteinases. They should therefore have an

* This work was supported by grants from Comisi6n Interminis- terial para el Desarrollo de la Investigacibn Cientifica y Tbcnica (Spain), Kungliga Fysiografiska Sallskapet i Lund, Direktor A. Pihls- sons Stiftelse, A. 6sterlunds Stiftelse, G. och J. Kocks Stiftelse, T. och E. Segerfalks Stiftelse, T. Nilson’s Foundation, the Swedish Society of Medicine, the Medical Faculty of the University of Lund, the Swedish Medical Research Council (Projects 05196, 09291, and 09915), Heilavernd Society, and the B. Magnusdottir and J. J. Bjar- nason Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession numbeds) x59964.

4 Recipient of a fellowship from the Ministerio de Educaci6n Y Ciencia, Spain.

7l To whom correspondence should be addressed. Tel.: 34 85 10 35 63; Fax: 34 85 23 22 55.

important regulatory role not only in physiological processes in which cysteine proteinases are involved, e.g. bone resorp- tion (Delaissb et al., 1984), but also in the control of pato- physiological conditions such as sepsis (Assfalg-Machleidt et dl., 1988), metastasizing cancer (Pietras et al., 1978; Koppel et al., 1984), and the local inflammatory processes in rheu- matoid arthritis (Mort et al., 1984) and purulent bronchiec- tasis (Buttle et aL, 1990). Based on structural and functional comparisons, the cystatins constitute a single evolutionary protein superfamily that can be classified into at least three different families of closely related members (Barrett et al., 1986a; Rawlings and Barrett, 1990). Family 1 cystatins con- tain about 100 amino acid residues (Mr 11,000-12,000) and lack disulfide bridges. Family 2 cystatins are about 120 amino acids long (M, 13,000-14,000) and have two intrachain disul- fide bonds. Finally, Family 3 cystatins, kininogens, represent the most complex members of this protein superfamily and contain three cystatin-like domains. Each of these three do- mains has two disulfide bonds at positions homologous to those in Family 2 cystatins (Muller-Ester1 et al., 1986).

At present four human Family 2 cystatins have been iso- lated and characterized. The first of these inhibitors to be sequenced was cystatin C, originally isolated from urine (Grubb and Lofberg, 1982) but widely distributed in tissues and biological fluids (Abrahamson et al., 1986). This inhibitor is synthesized as a preprotein with a signal peptide, indicating that the functions of the protein are mainly extracellular (Abrahamson et al., 1987a). Cystatin C plays an important role in the development of cerebral hemorrhage in patients affected by a hereditary form of amyloid angiopathy (Ghiso et al., 1986). In these patients, a variant of cystatin C is deposited as amyloid fibrils in the walls of the cerebral arteries leading to dementia, brain hemorrhage, and death in young adults (Lofberg et al., 1987; Palsdottir et al., 1988).

The three other human Family 2 cystatins whose amino acid sequences have been described, namely cystatins S, SN, and SA, have been isolated from saliva and show a close relationship with about 90% identical residues in painvise comparisons (Isemura et al., 1984, 1986, 1987). All of them have also been found in other body fluids, like urine, tears, and seminal plasma, but it seems that their extracellular distribution is more restricted than that of cystatin C (Abra- hamson et al., 1986). These three “secretory gland cystatins” are composed of 121 amino acid residues, sharing about 50% identical residues with the cystatin C sequence.

The similarities in protein structure between Family 2 cystatins are reflected by cross-hybridization at the DNA level. Hybridization with cystatin C cDNA to restriction endonuclease digests of genomic DNA has suggested that the Family 2 cystatins are encoded by a multigene family probably composed of seven members (genes or pseudogenes) (Abra-

20538

Page 2: Structure and Expression of the Gene Encoding Cystatin D, a Novel

Human Cystatin D 20539

hamson et al., 1990). Similar results have also been obtained by hybridizations of cystatin SN genomic or cDNA probes to human DNA (Saitoh et al., 1987; Al-Hashimi et al., 1988). TO date, the complete nucleotide sequences of three of these genes are known: the cystatin C gene called CST3 (Abraham- son et al., 1990), the cystatin SN gene (CSTl), and one cystatin pseudogene called CSTPl (Saitoh et al., 1987). In addition, the sequences of the exon parts of the cystatin SA gene called CST2 have been reported (Saitoh et al., 1987). It thus seems clear that additional members of the gene family remain to be found. In this study, we have used a cystatin C cDNA probe to screen a human genomic library for such additional members. We describe the cloning of a gene coding for a novel cysteine proteinase inhibitor, here designated cystatin D. The complete nucleotide sequence of this new member of the cystatin multigene family, called CST4, is presented as well as data showing its tissue-restricted expres- sion.

MATERIALS AND METHODS’

RESULTS AND DISCUSSION

Cloning of the Gene Encoding Cystatin D-The human cystatin gene family, encoding Family 2 cystatins, seems to be composed of at least seven genes or pseudogenes (Saitoh et al., 1987; Al-Hashimi et al., 1988, Abrahamson et al., 1990). At present, three genes and a pseudogene belonging to the family have been cloned and sequenced. In order to identify additional members of this multigene family, a human ge- nomic library was screened with a full-length human cystatin C cDNA probe (Abrahamson et al., 1987a). Five out of ap- proximately 5 x IO5 X-phage clones were selected on the basis of their positive hybridization to the probe (clones ChCl to ChC5). Recombinant phage DNA was isolated from the clones and then analyzed by cleavage with restriction endonucleases. DNA from clone ChC5 was initially selected for further nu- cleotide sequence analysis, since its restriction map showed major differences with the corresponding ones for previously identified cystatin genes. A single EcoRI fragment of 4.6 kb2 from ChC5 DNA (ChC5-E), corresponding to the 5‘-end of the insert, hybridized to the cystatin C cDNA probe. It was subcloned in pUC18, and the complete restriction map of the fragment was determined (Fig. 1). Nucleotide sequence analy- sis showed that this fragment contained, close to its 5’-end, a l-kb segment with marked similarity to the third exon regions of presently known cystatin genes. The sequence and hybrid- ization analysis data indicated that the remainder of a puta- tive novel cystatin gene was not present in the ChC5 clone, however. In a second round of screening of the genomic library, approximately 4 X lo5 additional X-phage clones were hybridized to the 0.9-kb EcoRI-Sac1 5‘-end fragment of the ChC5 insert (probe C5ES; Fig. 1). Seventeen clones displaying varying degrees of hybridization signal were isolated (ChC6- 22). The clones were rescreened with a 300-bp EcoRI-PstI fragment isolated from the 5‘-end of the ChC5 insert (EP-1; Fig. 1). This fragment was selected since it displayed the lowest degree of similarity with the corresponding regions of previously sequenced cystatin genes. Clone ChC21 also exhib- ited a strong hybridization signal to this probe, and its insert was analyzed by restriction endonuclease cleavages and sub-

’ Portions of this paper (including “Materials and Methods” and Figs. 1-3) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press.

The abbreviations used are: kb, kilobase(s); bp, base pair(s).

sequent Southern blotting. The fragments hybridizing to probe EP-1 after digestions with EcoRI (ChC21-E, 6.0 kb) and BamHI (ChC21-B, 10.1 kb) were subcloned in pEMBL19 and pUC19, respectively, and their restriction maps were determined (Fig. 1). In a parallel experiment, a map of the region corresponding to these fragments was constructed using restriction nuclease digests of genomic DNA and probes C5ES and EP-2 (see below and Fig. 1). This map corresponded fully with the restriction maps of fragments ChC21-E and ChC21-B (data not shown). Since fragment ChC21-B was large enough to contain a normal sized member of the cystatin gene family, further characterization was accomplished by nucleotide sequencing (see legend to Fig. 1).

Nucleotide Sequence Analysis of the Human Cystatin D Gene and Its Flanking Regions-The nucleotide sequence of a 4.3- kb segment of the genomic fragment ChC21-B and the trans- lated amino acid sequence from selected parts thereof are shown in Fig. 2. Sequence analysis identified extended regions with similarities to the cystatin C gene. An especially high degree of similarity was observed between nucleotides 3978 and 4226 (Fig. 2) and the 3”nontranslated region of the cystatin C cDNA (>80% sequence similarity), which explains the strong positive hybridization signal obtained for clone ChC5 in the initial screening of the genomic library. It ap- peared that the segment contained the complete coding infor- mation for a novel cystatin which we, in line with generally accepted nomenclature proposals (Barrett et al., 1986a), ten- tatively have called “cystatin D.” Comparison with the gene organization exhibited by the other Family 2 cystatin genes with known structures suggested that the cystatin D gene is composed of three exons and two introns. Exon 1 contains all of the 5’-noncoding sequence and the coding information for the first 77 amino acids of the protein, including a putative signal peptide. The first intron is composed of 1826 nucleo- tides, and it is located between the triplets coding for Gln-77 and Ile-78 (numbering from the first methionine residue), which are present in a conserved region involved in the inhibitory activity of cystatins. This intron is followed by exon 2 which codes for the next 38 amino acids of the protein. The second intron, 1234 nucleotides long, is positioned be- tween the triplets encoding amino acid residues Glu-115 and Glu-116. The final third exon codes for 27 amino acid residues ending with an in-phase TAG codon. The nucleotide se- quences at the donor and acceptor splice sites of the two introns identified in the cystatin D gene are in total agreement with the consensus sequences derived by Mount (1982). The overall organization of the gene coding for cystatin D is thus very similar to the other described Family 2 cystatin genes (Fig. 3). The distribution and length of introns and exons, the location of intron-exon junctions, and the nucleotide sequence similarities strongly suggest that this gene is a novel member of the cystatin multigene family. According to the nomencla- ture proposed by Saitoh et al. (1987), we propose to call the cystatin D gene “CST4.”

Analysis of the nucleotide sequence determined for the cystatin D gene reveals the presence of a number of consensus sequences which could affect the transcription of the gene. The 5“flanking region of the gene has a pyrimidine-rich sequence, ATAAA, 96 bp upstream from the putative ATG start codon. This TATA box sequence is common to a large number of eukaryotic promoters (Breathnach and Chambon, 1981) and is identical to that found at comparable positions in the cystatin C, SN, and SA genes. The GC content of this region is about 60%, similar to that found in the corresponding parts of the cystatin SN and SA genes (61 and 64%, respec- tively), but lower than the notably high content found in the

Page 3: Structure and Expression of the Gene Encoding Cystatin D, a Novel

20540 Human Cystatin D

same region of the cystatin C gene (>70%) (Abrahamson et al., 1990). The number of CpG dinucleotides in the 400-bp sequence upstream from the proposed translation start site of the CST4 gene is also lower than the number found in the cystatin C gene; the resulting ratio of CpG/GpC dinucleotides is 1/6 in the cystatin D gene while it is close to unity in the cystatin C gene (Abrahamson et dl., 1990). The corresponding values for the cystatin SN and SA genes are 1/9 and 1/16, respectively (Saitoh et al., 1987). In relation to this, it has previously been suggested (Abrahamson et al., 1990) that the high GC content and the large number of CpG dinucleotides in the 5’-flanking region of the cystatin C gene could represent an “HTF island,” similar to those usually found in promoter regions of housekeeping genes (Bird, 1986). The absence of these features in the cystatin D gene could be suggestive of a more tissue-restricted expression of this gene, as has been demonstrated for the secretory gland cystatin SN and SA genes.

The hexanucleotide GGGCGG is present in the 5”flanking region of the cystatin D gene, 140 bp upstream from the putative initiator methionine ATG codon. It has been re- ported that the transcription factor Spl has a high affinity for this sequence (Dynan, 1986). The cystatin C gene has an identical hexanucleotide sequence located at a comparable distance from the ATG codon (Levy et al., 1989; Abrahamson et al., 1990), while the cystatin SN and SA genes lack this GC box. The absence of CAAT box sequences both in the cystatin C and D genes is remarkable, since such sequences are present in the salivary cystatin genes (Saitoh et al., 1987). Other consensus sequences such as CAMP-responsive elements, ste- roid-responsive sequences, or enhancer elements are not pres- ent in the 5”flanking region of the cystatin D gene. The 3’- flanking region of the cystatin D gene shows a hexanucleotide ATTAAA located 235 nucleotides downstream from the TAG stop codon, which likely represents the polyadenylation signal of the cystatin D gene.

Primary Structure of Cystatin D Deduced from the Nucleo- tide Sequence of the Gene and Comparison with Those of Other Cystatins-Assuming that translation starts at the first ATG codon, the cystatin D gene contains an open reading frame

c y s D

c y s C

c y s SA

c y s SN

c y s s

c y s D

c y s c

c y s S A

c y s SN

c y s s

10 2 0 30

coding €or a protein composed of 142 amino acid residues (Fig. 2). In this sequence, the presence of a stretch of hydro- phobic residues close to the initial methionine suggests the existence of a secretory signal peptide. This proposed signal sequence contains a positively charged amino acid in the N- terminal region (histidine at position 6), which is also a typical feature of leader peptides (von Hejne, 1985). The eukaryotic cell signal peptidase usually cleaves secretory preproteins at a position in the sequence having residues with small, un- charged side chains in positions -3 and -1 from the cleavage site (i.e. in the last part of the signal peptide). Cleavage of precystatin D might therefore occur at position 20/21 (alanine both in positions -3 and -l), resulting in glycine as the putative N-terminal residue of the mature protein. The size of the leader sequence thus processed would be identical to those of the secretory gland cystatins. In addition, and al- though the signal peptide sequence described for cystatin C is longer (26 amino acid residues) (Abrahamson et al., 1987a), the proposed cleavage of precystatin D allows a maximal alignment of the amino acid sequences of mature cystatin D and C (Fig. 4).

If the above mentioned cleavage occurs, mature cystatin D would be composed of 122 amino acid residues, which is consistent with the size of other Family 2 cystatins. Compar- isons of its amino acid sequence, as deduced from the nucleo- tide sequence of the gene, with those of other cystatins, clearly support the assignment of cystatin D as a novel member of this proteinase inhibitor family, since all the regions which appear to be of functional relevance in cystatins are clearly represented in the amino acid sequence of cystatin D. Accord- ing to the models proposed by several groups (Barrett et al., 198613; Abrahamson et al., 1987b; Machleidt et al., 1989), further confirmed by x-ray crystallographic analyses (Bode et al., 1988; Stubbs et al., 1990), the binding site of cystatins comprises three well defined regions: an N-terminal segment in which the primary proteinase contact area is represented by side chains of amino acids on the N-terminal side of an invariant glycine residue; a conserved amino acid sequence (consensus Gln-Xaa-Val-Xaa-Gly) in the middle of the poly- peptide chain; and a third region (containing a conserved Pro-

40 50 6 0 64a 7 0

a o 9 0 100 110 120 130 1 4 0

I____ 1

5 1 %

5 4 %

5 5 %

5 5 %

FIG. 4. Alignment of amino acid sequences for human members of cystatin Family 2. The numbering refers to the cystatin C sequence as deduced from its cDNA, starting from the first residue of the signal peptide (Abrahamson et al., 1987a). Doshes indicate gaps introduced to optimize the alignment; dots represent unknown sequence. Vertical lines indicate residues identical in cystatin D and all four previously described Family 2 cystatins, and arrows indicate positions where the coding sequence of the cystatin D, C, SN, and SA genes are interrupted by intron sequences. Boxes indicate conserved residues that are involved in the cysteine proteinase inhibitory activity of the proteins. The four cysteine residues shown to form two disulfide bridges in cystatin C (Grubb et al., 1984) are marked below the sequences.

Page 4: Structure and Expression of the Gene Encoding Cystatin D, a Novel

Human Cystatin D 20541

cystatin subfamily containing cystatins S, SN, and SA. Expression of the Cystatin D Gene-Since cystatin D has

not yet been demonstrated in any human tissue or body fluid, the expression of its gene was studied. Samples of several human tissues were collected; total RNA was isolated from the samples and subjected to Northern blot analysis. Since the first exon of the cystatin D gene displays the lowest degree of sequence similarity with both cystatin C and the secretory gland cystatin genes, a restriction fragment containing this portion of the gene was chosen as the probe (EP-2; Fig. 1). As can be seen in Fig. 5A, a single hybridizing band was recognized by the probe in human parotid gland RNA. This band corresponds to a mRNA species of 800-900 nucleotides, and it was not detected in any other of the investigated tissues. Reprobing of the filter with the cystatin C cDNA demon- strated that all tissues contained cystatin C mRNA, however (Fig. 523). In a similar blot of RNA samples from seminal vesicles, prostate, epididymis, and testis, no signal could be detected with the cystatin D gene probe but, again, all tissues contained cystatin C mRNA (not shown).

In order to rule out the possibility of cross-hybridization with mRNA for the secretory gland cystatins S, SN, or SA, the same filter was hybridized with a 20-mer oligonucleotide, corresponding to nucleotides 698-718 in Fig. 2, which repre-

28s-

18s-

28S-

18s-

FIG. 5. Expression of the cystat in D gene in human tissues. Samples of total RNA were electrophoresed in a 1.4% agarose/ formaldehyde denaturing gel, blotted onto a nylon filter, and hybrid- ized to probe EP-2 (A), a genomic restriction fragment containing the first exon of the cystatin D gene and, after deprobing the filter, to a full-length cystatin C cDNA probe ( B ) . The positions of 28 and 18 S rRNA bands are indicated.

Trp pair in the case of Family 2 and 3 cystatins) toward the C-terminal end of the molecule. These structural elements are all present in equivalent positions in the amino acid sequence deduced for cystatin D (Fig. 4). One amino acid residue in the cystatin D sequence, asparagine in position 40 of the proposed mature polypeptide chain (64a in Fig. 4), represents an insertion compared with the other human Fam- ily 2 cystatins. This residue should be positioned at the opposite side to the proteinase contact area in the three- dimensional structure (Bode et al., 1988) and will hence probably not affect the cysteine proteinase inhibitory function of the protein.

Previously characterized Family 2 cystatins all contain 4 cysteine residues which are involved in the formation of the two disulfide bonds characteristic for the family. The amino acid sequence deduced from the cystatin D gene displays the same 4 cysteine residues located a t identical positions (Fig. 4), supporting the proposal that cystatin D is a Family 2 cystatin. Pairwise comparisons for sequence identity between the five human Family 2 cystatins shows 51-55% similarity between cystatin D and the other four proteins (Fig. 4). The corresponding values for cystatin C are 51-57%, wheras the mutual similarities between cystatins S, SN, and SA are from 87 to 89%. It is therefore on this basis possible to distinguish three protein subfamilies within cystatin family 2: the cystatin C subfamily, the cystatin D subfamily, and the secretory gland

sents a nucleotide sequence specific for the cystatin D gene. The oligonucleotide probe recognized the same band in pa- rotid gland RNA as the genomic probe (data not shown), demonstrating that this band corresponds to cystatin D RNA and not to mRNA encoding any other member of the cystatin family. This apparently tissue-specific expression of the cys- tatin D gene would be in agreement with the absence of a region with the properties of an HTF island in the 5'-flanking part of the cystatin D gene (see above).

The detection of a cystatin D message in parotid gland RNA, the presence of a signal peptide, and the sequence similarities with previously known cystatins strongly suggest a role of the product of the here described gene, cystatin D, as a novel salivary cysteine proteinase inhibitor. Studies are in progress to isolate the protein as well as to establish its precise role among the increasing number of human cysteine proteinase inhibitors.

Acknowledgments-We are grateful to Dr. S. Gasc6n for support and to Dr. Hans Lilja for supplying male sex gland RNA samples.

REFERENCES

Abrahamson, M., Barrett, A. J., Salvesen, G., and Grubb, A. (1986)

Abrahamson, M., Grubb, A., Olafsson, I., and Lundwall, A. (1987a)

Abrahamson, M., Ritonja, A., Brown, M., Grubb, A., Machleidt, W., and Barrett, A. (1987b) J. Biol. Chem. 262,9688-9694

Abrahamson, M., Olafsson, I., Palsdottir, A., Ulvsback, M., Lundwall, A., Jensson, O., and Grubb, A. (1990) Biochem. J. 268,287-294

Al-Hashimi, I., Dickinson, D. P., and Levine, M. J. (1988) J. Biol.

Assfalg-Machleidt, I., Jochum, M., Klaubert, W., Inthorn, D., and Machleidt, W. (1988) Biol. Chem. Hoppe-Seyler 369, (suppl.) 263- 269

Barrett, A. J., Fritz, H., Grubb, A., Isemura, S., Jarvinen, M., Katun- uma, N., Machleidt, W., Muller-Esterl, W., Sasaki, M., and Turk, V. (1986a) Biochem. J. 236,312

Barrett, A. J., Rawlings, N. D., Davies, M. E., Machleidt, W., Sal- vesen, G., and Turk, V. (1986b) in Proteinuse Inhibitors (Barrett, A. J., and Salvesen, G., eds) pp. 515-569, Elsevier Science Publish- ers B.V., Amsterdam

J. Biol. Chem. 261,11282-11289

FEBS Lett. 216,229-233

Chem. 263,9381-9387

Benton, W. D., and Davis, R. W. (1977) Science 196,180-182 Bird, A. P. (1986) Nature 321,209-213 Bode, W., Engh, R., Musil, D., Thiele, U., Huber, R., Karshikov, A.,

Brzin, J., and Turk, V. (1988) EMBO J. 7, 2593-2599

Page 5: Structure and Expression of the Gene Encoding Cystatin D, a Novel

20542 Human Cystatin D Breathnach, R., and Chambon, P. (1981) Annu. Reu. Biochem. 5 0 ,

Buttle, D. J., Burnett, D., and Abrahamson, M. (1990) S c a d . J. Clin.

DelaissB, J.-M., Eeckout, Y., and Vaes, G. (1984) Biochem. Biophys.

Dynan, W. S. (1986) Trends Genet. 2,209-213 Ghiso, J., Jensson, O., and Frangione, B. (1986) Proc. Natl. Acad. Sci.

Grubb, A., and Lofberg, H. (1982) Proc. Natl. Acad. Sci. U. S. A. 7 9 ,

Grubb, A., Lofberg, H., and Barrett, A. J. (1984) FEBS Lett. 170 ,

Isemura, S., Saitoh, E., and Sanada, K. (1984) J. Biochem. (Tokyo)

Isemura, S., Saitoh, E., and Sanada, K. (1986) FEBS Lett. 198 , 145-

Isemura, S., Saitoh, E., and Sanada, K. (1987) J. Bwchem. (Tokyo)

Koppel, P., Baici, A., Keist, R., Matzku, S., and Keller, R. (1984) Exp. Cell Biol. 52,293-299

Levy, E., L6pez-Otin, C., Ghiso, J., Geltner, D., and Frangione, B. (1989) J. Exp. Med. 169, 1771-1778

Lofberg, H., Grubb, A. O., Nilsson, E. K., Jensson, O., Gudmundsson, G., Blondal, H., Arnason, A., and Thorsteinsson, L. (1987) Stroke

349-383

Lab. Invest. 5 0 , 509-516

Res. Commun. 125,441-447

U. S. A. 83,2974-2978

3024-3027

370-374

96,489-498

149

102,693-704

18,431-440

Machleidt, W., Laber, B., Assfald-Machleidt, I., Esterl, A., Wiegand, G., Kos, J., Turk, V., and Bode, W. (1989) FEBS Lett. 243, 234- 238

Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Mort, J. S., Recklies, A. D., and Poole, A. R. (1984) Arthritis Rheum.

Mount, S. (1982) Nucleic Acids Res. 10,459-472 Mount, D. M., and Conrad, B. (1986) Nucleic Acids Res. 14,443-454 Miiller-Esterl, W., Iwanaga, S., and Nakanishi, S. (1986) Trends

Biochem. Sci. 11,336-339 Palsdottir, A., Abrahamson, M., Thorsteinsson, L., Arnason, A.,

Olafsson, I., Grubb, A., and Jensson, 0. (1988) Lancet ii, 603-604 Pearson, W. R., and Lipman, D. J. (1988) Proc. Natl. Acad. Sci.

Pietras, R. J., Szego, C. M., Mangan, C. E., Seeler, B. J., Burtnett,

Rawlings, N. D., and Barrett, A. J. (1990) J. Mol. Euol. 3 0 , 60-71 Saitoh, E., Kim, H. S., Smithies, O., and Maeda, N. (1987) Gene

Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad.

Stubbs, M. T., Laber, B., Bode, W., Huber, R., Jerala, R., Lenarcic,

von Hejne, G. (1985) J. Mol. Bwl. 184, 99-105

27,509-515

U. S. A. 85,2444-2448

M. M., and Orevi, M. (1978) Obstet. Gynecol. 52,321-327

(Amst.) 61,329-338

Sci. U. S. A. 7 4 , 5463-5467

B., and Turk, V. (1990) EMBO J. 9 , 1939-1947

SvpplcmenlsryMatc"al lo

STRU~REANDEXPRESS1ONOFTHEGENEENCODlNGCYSTATlND.ANOVELHUMANCYS~lNE PROTHNASEINHIBITOR

JorC P. Freqe. Magnur Abrahmron, lrlcifur 61.fSson. Glotia Vslawo. Andns Gmbb,

and Culm Upez-Olin

Page 6: Structure and Expression of the Gene Encoding Cystatin D, a Novel

Human Cystatin D