the structure of characterization the rat aggrecan of its promoter

9
THE JOURNAL OF BIOLOGICAL. CHEMISTRY Vol. 269, No. 46, Issue of November 18, pp. 29232-29240, 1994 Printed in U.S.A. The Structure of Characterization the Rat Aggrecan of Its Promoter* Gene and Preliminary (Received for publication, April 26, 1994,and in revised form,August 29, 1994) Kurt J. DoegeSOn, Katherine Garrisonfll, Silvija N. CoulterS, and Yoshi Yamada** From the $Research Unit, Shriners Hospitals for Crippled Children and the §Departments of Biochemistry and Molecular Biology, Cell Biology and Anatomy, Oregon Health Sciences University, Portland, Oregon 97201 and the **Laboratory of Developmental Biology, NIDR, National Institutes of Health, Bethesda, Maryland 20892 Aggrecan is a major structural component of cartilage extracellular matrix and a specific gene product of dif- ferentiated chondrocytes. cDNA clones have been used to isolate rat aggrecan genomic clones from phage and cosmid libraries, producing over 80 kilobases (kb) of overlapping DNA containing the complete rat aggrecan gene, including 12 kb of 5’- and 8 kb of 3”flanking DNA. DNA sequencing shows 18 exons, mostof which encode structural or functional modules;exceptions are do- mains G1-B and G2-B, which are split into two exons and the G3 lectin domain, which is encoded by three exons. There is one expressed epidermal growth factor-like exon and in addition a non-expressed “pseudo-exon” en- coding a heavily mutated epidermal growth factor-like domain. Intron sizes have been determined by restric- tion mapping and inter-exonpolymerase chain reaction; a 30-kb intron separates exons 1 and 2. Exon 1 has been mapped by primer extension and S1 nuclease protec- tion; it encodes 381 base pairs (bp) of 5”untranslated sequence. There is a minor promoter which initiates transcription an additional 68 bp 5’ of the major pro- moter start site. DNA sequence is reported for a 529-bp fragment encompassing exon 1, including 120 bp of 5’- flanking DNA comprising the promoter. This promoter is lacking the TATAA or CCAAT elements but has several putative binding sites for transcription factors. A922-bp DNA fragment with 640-bp 5”flanking DNA and 282-bp exon 1 sequence showed higher promoter activity in transfected chondrocytes than in fibroblasts, is com- pletely inactive in the reverse orientation, and is strongly enhanceable in the forward direction by the SV40 enhancer. Aggrecan, the large aggregating proteoglycan of cartilage, is a major component of the wet weight of cartilage and confers many of the physical properties of cartilage essential to its function (1). Aggrecan is a marker for cartilage and is expressed abundantly only in this tissue, but has been reported at low levels in other tissues (2 and references therein). More recently, monoclonal antibodies have been used to distinguish aggrecan * This work was supported by a grant from Shriners of North America (to K. J. D.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. to the GenBankTMIEMBL Data Bank with accession number(s) 503485 The nucleotide sequence(s1 reported in this paper has been submitted and U13974. 1 To whom correspondence should be addressed: Research Unit, Shri- ners Hospital for Crippled Children, 3101 S. W. Sam Jackson Park Rd., Portland, OR 97201. Fax: 503-221-3451. 17837. 11 Present address: Biology Dept., Bucknell University, Lewisburg, PA from other aggregating proteoglycans in a tissue distribution study, and some non-cartilage tissues inchicken were found to be immunologically reactive with anti-aggrecan antibodies (3). Several cDNA sequences have been reported for aggrecans from different species: rat (4,5), chicken (6,7,29),bovine (8,9), human (10, 111, and mouse (12). Application of these probes to blotting and in situ hybridization studies detects expression of aggrecan in developing embryos only in cartilaginous tissues or notochord (13). A recent study using the polymerase chain re- action has demonstrated low levels of aggrecan mRNA in chick embryo calvarium (141, but theexpression of aggrecan is more restricted to cartilage than is that of type I1 collagen, for ex- ample (15). Steady statelevels of mRNA for type I1 collagen in chondrocytes are at least 10-fold higher than for aggrecan or link protein (16), and the tissue distribution studies may reflect a sensitivity threshold. To address questions of gene regulation in chondrocyte differentiation, it will be important to determine quantitatively the transcriptional levels of the various chon- drocyte marker genes in different tissues during development. It will also be important to examine the role that alternative splicing may play in distinguishing different stages of chondro- cyte differentiation, as seen with type I1 collagen (17) and po- tentially aggrecan (18). Aggrecan is the first described member of a gene family of large aggregating proteoglycans, which also includes versican (191, neurocan (20), and brevican (21). Versican (PG-M) has a wider tissue distribution than aggrecan and is highly expressed in chondroprogenitor cells, but decreases in abundance with chondrogenesis, at least in chickens (3,221; neurocan and brevi- can appear restricted to neural tissues. These family members share many structural features includingan amino-terminal do- main, G1, which is related to link protein and binds to hyalu- ronan and link protein through subdomainsA, B, and B’. Pro- teoglycans of this group also possess a COOH-terminal domain, (G3 in aggrecan) composed of a C-type lectin motif (LECl1(23), a complement regulatory protein-related motif (CRP) (241, and one or two repeats of epidermal growth factor-like motifs (EGF) (25). This particular group of motifs is shared with the selectins (26), which are cell adhesion proteins. The G1 and G3-like do- mains in most of these proteoglycans are separated by long, structurally extended sequences, apparently unrelated, which provide attachment points for glycosaminoglycans and other carbohydrate chains; these domains are called keratan sulfate and chondroitin sulfate in aggrecan. Aggrecan is the only family member with a third globular domain, G2, which contains ad- ditional link-related motifs B and B’; G2 is adjacent to G1, sepa- rated by an extended region called interglobular domain. The abbreviations used are: LEC, lectin; EGF, epidermal growth factor; CRP,complementregulatoryprotein; PCR, polymerase chain reaction; CAT, chloramphenicol acetyltransferase; bp, base pairk); kb, kilobase(s); PIPES, 1,4-piperazinediethanesulfonic acid; nt, nucleotide(s). 29232

Upload: nguyendieu

Post on 19-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

THE JOURNAL OF BIOLOGICAL. CHEMISTRY Vol. 269, No. 46, Issue of November 18, pp. 29232-29240, 1994 Printed in U.S.A.

The Structure of Characterization

the Rat Aggrecan of Its Promoter*

Gene and Preliminary

(Received for publication, April 26, 1994, and in revised form, August 29, 1994)

Kurt J. DoegeSOn, Katherine Garrisonfll, Silvija N. CoulterS, and Yoshi Yamada** From the $Research Unit, Shriners Hospitals for Crippled Children and the §Departments of Biochemistry and Molecular Biology, Cell Biology and Anatomy, Oregon Health Sciences University, Portland, Oregon 97201 and the **Laboratory of Developmental Biology, NIDR, National Institutes of Health, Bethesda, Maryland 20892

Aggrecan is a major structural component of cartilage extracellular matrix and a specific gene product of dif- ferentiated chondrocytes. cDNA clones have been used to isolate rat aggrecan genomic clones from phage and cosmid libraries, producing over 80 kilobases (kb) of overlapping DNA containing the complete rat aggrecan gene, including 12 kb of 5’- and 8 kb of 3”flanking DNA. DNA sequencing shows 18 exons, most of which encode structural or functional modules; exceptions are do- mains G1-B and G2-B, which are split into two exons and the G3 lectin domain, which is encoded by three exons. There is one expressed epidermal growth factor-like exon and in addition a non-expressed “pseudo-exon” en- coding a heavily mutated epidermal growth factor-like domain. Intron sizes have been determined by restric- tion mapping and inter-exon polymerase chain reaction; a 30-kb intron separates exons 1 and 2. Exon 1 has been mapped by primer extension and S1 nuclease protec- tion; it encodes 381 base pairs (bp) of 5”untranslated sequence. There is a minor promoter which initiates transcription an additional 68 bp 5’ of the major pro- moter start site. DNA sequence is reported for a 529-bp fragment encompassing exon 1, including 120 bp of 5’- flanking DNA comprising the promoter. This promoter is lacking the TATAA or CCAAT elements but has several putative binding sites for transcription factors. A922-bp DNA fragment with 640-bp 5”flanking DNA and 282-bp exon 1 sequence showed higher promoter activity in transfected chondrocytes than in fibroblasts, is com- pletely inactive in the reverse orientation, and is strongly enhanceable in the forward direction by the SV40 enhancer.

Aggrecan, the large aggregating proteoglycan of cartilage, is a major component of the wet weight of cartilage and confers many of the physical properties of cartilage essential to its function (1). Aggrecan is a marker for cartilage and is expressed abundantly only in this tissue, but has been reported at low levels in other tissues (2 and references therein). More recently, monoclonal antibodies have been used to distinguish aggrecan

* This work was supported by a grant from Shriners of North America ( to K. J. D.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

to the GenBankTMIEMBL Data Bank with accession number(s) 503485 The nucleotide sequence(s1 reported in this paper has been submitted

and U13974. 1 To whom correspondence should be addressed: Research Unit, Shri-

ners Hospital for Crippled Children, 3101 S. W. Sam Jackson Park Rd., Portland, OR 97201. Fax: 503-221-3451.

17837. 11 Present address: Biology Dept., Bucknell University, Lewisburg, PA

from other aggregating proteoglycans in a tissue distribution study, and some non-cartilage tissues in chicken were found to be immunologically reactive with anti-aggrecan antibodies (3). Several cDNA sequences have been reported for aggrecans from different species: rat (4,5), chicken (6,7,29), bovine (8,9), human (10, 111, and mouse (12). Application of these probes to blotting and in situ hybridization studies detects expression of aggrecan in developing embryos only in cartilaginous tissues or notochord (13). A recent study using the polymerase chain re- action has demonstrated low levels of aggrecan mRNA in chick embryo calvarium (141, but the expression of aggrecan is more restricted to cartilage than is that of type I1 collagen, for ex- ample (15). Steady state levels of mRNA for type I1 collagen in chondrocytes are at least 10-fold higher than for aggrecan or link protein (16), and the tissue distribution studies may reflect a sensitivity threshold. To address questions of gene regulation in chondrocyte differentiation, it will be important to determine quantitatively the transcriptional levels of the various chon- drocyte marker genes in different tissues during development. I t will also be important to examine the role that alternative splicing may play in distinguishing different stages of chondro- cyte differentiation, as seen with type I1 collagen (17) and po- tentially aggrecan (18).

Aggrecan is the first described member of a gene family of large aggregating proteoglycans, which also includes versican (191, neurocan (20), and brevican (21). Versican (PG-M) has a wider tissue distribution than aggrecan and is highly expressed in chondroprogenitor cells, but decreases in abundance with chondrogenesis, at least in chickens (3,221; neurocan and brevi- can appear restricted to neural tissues. These family members share many structural features includingan amino-terminal do- main, G1, which is related to link protein and binds to hyalu- ronan and link protein through subdomains A, B, and B’. Pro- teoglycans of this group also possess a COOH-terminal domain, (G3 in aggrecan) composed of a C-type lectin motif (LECl1(23), a complement regulatory protein-related motif (CRP) (241, and one or two repeats of epidermal growth factor-like motifs (EGF) (25). This particular group of motifs is shared with the selectins (26), which are cell adhesion proteins. The G1 and G3-like do- mains in most of these proteoglycans are separated by long, structurally extended sequences, apparently unrelated, which provide attachment points for glycosaminoglycans and other carbohydrate chains; these domains are called keratan sulfate and chondroitin sulfate in aggrecan. Aggrecan is the only family member with a third globular domain, G2, which contains ad- ditional link-related motifs B and B’; G2 is adjacent to G1, sepa- rated by an extended region called interglobular domain.

The abbreviations used are: LEC, lectin; EGF, epidermal growth factor; CRP, complement regulatory protein; PCR, polymerase chain reaction; CAT, chloramphenicol acetyltransferase; bp, base pairk); kb, kilobase(s); PIPES, 1,4-piperazinediethanesulfonic acid; nt, nucleotide(s).

29232

Rat Aggrecan Gene and Promoter 29233

The aggrecan gene has been mapped to human chromosome 15q26.1(27) and to mouse chromosome 7 (12). While aggrecan has not been linked thus far to human chondrodystrophies or other heritable disorders of the skeletal system (281, there are a number of animal mutants which are apparently defective in this gene, including the nanomelic chicken (29), and the carti- lage matrix-deficient mouse (cmd) (30). The altered expression of aggrecan seems to be a hallmark of osteoarthitis and other degenerative diseases of cartilage. An understanding of the structure and regulation of the aggrecan gene will enhance our ability to identify mutations and altered expression of this gene and develop therapies for debilitating conditions ofjoint tissues and abnormal skeletal growth.

We report here the first complete description of the gene structure for a member of this proteoglycan family, along with the beginning of functional studies of the promoter. Previous reports have included the sequence of five exons encoding the chicken aggrecan G3 domain (311, a symposium presentation in which some of the preliminary rat and human aggrecan exon structure was summarized and compared (321, and a review which summarized work in progress on the rat aggrecan and link protein genes (33). This work provides a starting point for pursuing an understanding of the processes by which aggrecan expression is regulated.

EXPERIMENTAL PROCEDURES Isolation of Genomic Clones--Two A phage genomic libraries were

screened using rat aggrecan cDNAs as probes. A partial EcoRI rat genomic library in Charon 4A (gift of Dr. Tom Sargent, NICHHD, NIH) was screened using a mixture of labeled cDNAs encoding the entire aggrecan structural sequence: inserts from clones pRPG1-5 (5); 4 x lo5 plaques were screened in duplicate as described (ll), approximately 30 clones were plaque purified, and of the eight that were grown for DNA extraction (34) three different clones were obtained as multiple isolates. A partial HaeIII rat genomic library in Charon 4A (also obtained from Dr. Tom Sargent) was screened using a 353-bp Sau3AI fragment of cDNA clone RPG5.1000 (nucleotides +63 to +981 in the complete cDNA sequence)'; the probe fragment included 318 bp of 5"leader sequence and 35 bp of the signal peptide coding sequence. 11 plaque-purified clones were obtained, and eight were selected for further analysis yield- ing three different clones. Finally, a rat genomic cosmid library (Clon- tech) was screened by colony hybridization (34) using as the first-screen probe a 288-bp PCR fragment of 5"leader sequence determined to be within exon 1 (nucleotides 93-381); subsequent rounds of screening were performed using the full-length RPG5.1000 cDNA as probe (exons 1-4). Clones were initially characterized by Southern hybridization of restriction digests using cDNA and oligonucleotide probes as described previously (5).

DNA Sequencing-Genomic fragments from phage clones were sub- cloned into pUC18 and sequenced directly from CsCl density gradient- purified double-stranded plasmid using modified T7 DNA polymerase (United States Biochemical Corp., Cleveland, OH), the dideoxy-termi- nation method (35), and oligonucleotide primers (20-24 mers) from the cDNA sequence (5). Generally 5 pg of plasmid DNA was denatured in 0.2 M NaOH, precipitated with 1.4 M ammonium acetate and 3.5 vol- umes of ethanol, and redissolved in 10 pl of polymerase buffer (USB) containing 2 pmol of primer. The primer template-primer mixture was annealed by first heating to 70 "C for 10 min, then cooling to 37 "C over a period of 30 min prior to sequencing. The 1.8-kb EcoRI-BamHI fi-ag- ment containing exon 1 was sequenced by the random fragment strat- egy (36), in which a concatemerized fragment is mechanically sheared and a size-selected population of fragments prepared and blunt-end cloned into SmaI-cut M13 for single strand sequencing. The sequence was obtained at least three times from each strand and was assembled using the UWGCG fragment assembly programs (37). In some cases sequences were obtained directly from genomic cosmid or phage clones, or PCR products, using Taq polymerase and a thermocycler as recom- mended by the supplier of the fmol kit (Promega).

Primer Extension Analysis-RNA was prepared from Swarm rat chondrosarcoma cells released from matrix by collagenase digestion as

K. J. Doege, K. Garrison, S. N. Coulter, and Y. Yamada, unpublished results.

described previously (4), using the acid guanidine/phenol method (38). Poly(A)' mRNA was prepared by oligo(dT) chromatography (34). Oligo- nucleotide primers were end-labeled using [y?'PIATP and polynucle- otide kinase to a specific activity of approximately 2 x lo* counts/min/ pg. 2.5 x lo6 countdmin of primer was combined with 10 pg of poly(A)+ RNA in a volume of 25 pl containing 5 m~ sodium phosphate, pH 6.8, and 5 mM EDTA. The sample was heated at 90 "C for 10 min, made 77 mM in potassium chloride, and slow cooled to 42 "C. The sample was then diluted to 50 p1 containing 56 mM potassium chloride, 50 mM Tris-HC1, pH 8.3, 5 m~ dithiothreitol, 7.5 m~ MgCl,, 1 mM each of the four deoxynucleotide triphosphates, 1 p1 of RNasin (Promega), and 400 units of avian myeloblastosis virus reverse transcriptase (Promega). The reaction was incubated for 2 h at 42 "C, heat-inactivated at 90 "C, phenolkhloroform extracted, and the nucleic acids precipitated with 66% ethanol, 0.3 M sodium acetate, 10 pg of carrier tRNA. The products were dissolved in 80% formamide, 0.1% each xylene cyano1 and brom- phenol blue tracking dyes, heated at 90 "C, and loaded on a 7 M urea, 8% polyacrylamide sequencing gel for sizing relative to a sequence reaction from the same region of DNA, or in some cases end-labeled restriction fragments.

S1 Nuclease Protection-The method described (34) for double- stranded DNA probes was followed. End-labeled fragment (lo6 counts/ min, 500 ng) was coprecipitated with 10 pg of either poly(AY RNAfrom rat chondrosarcoma or yeast tRNA. The pelleted nucleic acids were dissolved in 80% formamide, 0.4 M PIPES, pH 6.4, 0.4 M NaCl, 1 mM EDTA, denatured for 10 min at 90 "C, and annealed for 3 h at 52-62 "C, according to GC content (34). The DNA-RNA hybrids were subjected to S1 nuclease digestion in 300 p1 of 0.05 M ammonium acetate, pH 4.5, 0.25 M NaC1, 0.01 M zinc chloride containing 6.25 pg of denatured salmon sperm DNA, between 10 and 90 units of S1 nuclease was added, and samples were incubated 30 min at 37 "C. Reactions were stopped with addition of 75 p1 of ice-cold 4 M ammonium acetate, pH 4.5, 0.1 M EDTA, and 25 pg of tRNA, extracted with phenoVchloroform (l:l), and the nucleic acids precipitated with isopropyl alcohol for analysis on denaturing polyacrylamide gel electrophoresis as described for the primer extension products.

PCR Analysis-PCR amplification of DNA or reverse-transcribed RNA templates was accomplished using standard methods as described previously (11). Annealing temperatures were calculated by the PRIMER program (kindly provided by Drs. S . Lincoln, M. Daly, and E. Lander, Massachusetts Institute of Technology Center for Genome Re- search). For amplification of the EGF2 domain, primers which flanked the putative insertion site of the EGF exon in the cDNA were used: bp 5549-5569 and 6602-6622 (numbered including the EGF2 exon se- quence). EcoRI recognition sites and spacer sequences were included at the 5'-ends for ease of cloning.

Promoter-Reporter Plasmid Constructs-The series of CAT vectors available from Promega was used. The initial promoter constructs, p64OCAT+ and p64OCAT- are the forward and reverse orientations, respectively, of a 922-bp EcoRI-PstI fragment obtained from the 1.8-kb EcoRI-BamHI fragment encompassing exon 1. The EcoRI-PstI fragment was rendered blunt-ended by T4 DNA polymerase and ligated to Hin- dIII linkers for cloning into the Hind111 site of the Basic-CAT vector. This fragment was also cloned in both orientations into the CAT vector with the SV40 enhancer in the downstream BamHI site to give p640CAT-En+ and p640CAT-En-.

Zkansfections-Primary chick embryo sternal chondrocytes or tendon fibroblasts were obtained from day 16 embryos of white leghorn chick- ens; cells were liberated from tissues by treatment with 0.3% collagen- ase (Worthington Biochemical Corp.) in 0.5 x Ham's F-12 (Life Tech- nologies, Inc.), 3% penicillin-streptomycin for 3 h at 37 "C. The cells were pelleted and resuspended to 1.25 x lo' cells/ml in phosphate- buffered saline. lo7 cells were cotransfected by electroporation with 20-30 pg of CAT reporter plasmid and an equal amount of an SV40 promoter-driven P-galactosidase reporter plasmid, pSVp (Clontech), as an internal reference. Electroporation was performed using a Gene Pulser (Bio-Rad) at 0.45 kV and 250 microfarads. The cells were plated in 20 ml of Ham's F-12,10% fetal bovine serum (Life Technologies, Inc.), 3% penicillin-streptomycin onto 100-mm culture dishes and grown in 5% CO, at 37 "C for 60-72 h. The cells were harvested, washed three times in phosphate-buffered saline, resuspended in 150 pl of 0.25 M Tris, pH 7.8, lysed by three successive cycles of freeze-thaw, and centrifuged at 14,000 x g, 4 "C for 20 min. The extract was quick-frozen and stored at -80 "C.

P-Galactosidase assays were performed as described (34). CAT ac- tivity was assayed by silica gel thin layer chromatography as de- scribed (34) using volumes of extract normalized to equivalent P-galac- tosidase activities. Reaction products were visualized and quantitated

29234 Rat Aggrecan Gene and Promoter

10 20 30 40 50 60 70 80 I

FIG. 1. Restriction map of the rat aggrecan gene and positions of ex- ons. The scale is shown at the top in kb. The second line displays the number, po- sitions, and relative sizes of the exons. The genomic fragments cloned from A phage Charon 4A libraries are shown as solid rectangles, while the cosmid insert obtained from the p W 1 5 library is

sites for EcoRI are indicated as E on the shown with diagonal stripes. Restriction

for BamHI ( B ) , HindIII ( H ) , KpnI (K) , clones themselves, while additional sites

SmaI ( S ) , and X6aI (X) are shown on in- dividual lines. The asterisk indicates the position in A7 where an EcoRI site found in Cos77 and A 1 is missing.

B J I I I I I I I I I I I 1 I I

H I I I I 111 I I I II I I I I

on a Betascope 603 (Intelligenetics, Mountain View, CA) radioanalytic imager.

RESULTS Three genomic libraries were screened to obtain the complete

aggrecan gene. The initial screening of a partial EcoRI library yielded three different clones encoding all but the signal se- quence and 5"untranslated region of the aggrecan structural sequence. These are shown as A31 , A l , and A l l in Fig. 1. At- tempts to clone the remaining upstream exons using oligonu- cleotide probes were unsuccessful; finally a cDNA fragment encoding most of the 5"untranslated sequence and signal pep- tide was obtained and used to screen a partial HaeIII library. Three additional clones were obtained, shown as A5, h4, and A7 in Fig. 1. These two groups of clones from different libraries encoded all the exons of the rat aggrecan gene but did not overlap in the first intron. To clone this missing portion, a cosmid library was screened using cDNA probes for exons 1-4; the resulting 42-kb clone is shown as cos77 in Fig. 1.

Analysis of Genomic Clones-EcoRI fragments of the genomic clones were subcloned into pUC18, and rough maps of the coding regions were constructed by Southern hybridization analysis following cleavage with the restriction enzymes BamHI, HindIII, KpnI, SmaI, and XbaI. The exon boundaries were precisely determined by sequencing subcloned genomic fragments with exon-specific primers, and intron sizes were mapped by a combination of restriction blot analysis with exon- specific oligonucleotide probes, sizing of inter-exon PCR prod- ucts, and sequencing. These data are summarized graphically in Fig. 1 and in more detail in Fig. 2. The exons are distributed within a 65 kb length, much of which is taken up by a 30-kb intron separating exons 1 and 2. There are 18 exons, most of which correspond to structural domains of the molecule; nota- ble is the extremely large (3.5 kb) exon encoding the entire chondroitin-sulfate attachment region of the protein. The 5'- leader sequence, the signal peptide, G1-A, G1-B', G2-B', the interglobular domain, keratan sulfate, G3-EGF, and G3-CRP domains are also encoded by single exons. Exceptions are the G1-B, G2-B, and G3-LEC domains, which are encoded as two, two, and three exons, respectively. The 3'-terminal exon 18 is composed of a short coding sequence, the termination codon, and, presumably, the entire untranslated trailer sequence. In addition, there appears to be a "pseudo-exon" encoding an ad- ditional EGF-like domain as discussed below; this is designated as X in Figs. 1 and 2. The internal introns are all of the sym- metrical phase 1 type (39), except for those which split the struc-

tural domains G1-B (intron 4), G2-B (intron 8), and G3-LEC (intron 15); these are phase 2-2, 2-2, and 0-0, respectively.

During the course of sequencing these exons, an error was discovered in the reported cDNA sequence for exon 6 (Gl-B') (5); nt 1210-1217 should be corrected from GGACGGTGG to GGAGGCTGG. The corresponding amino acid sequence changes are from (CRTVG) to (CRRLG). The rat amino acid sequence now is identical to the corresponding human se- quence (11). This error has been corrected in the most recent GenBank entry.

Cloning and Analysis of EGF-like Repeat Exons-A reverse transcriptase PCR product from rat chondrocyte and chondro- sarcoma mRNA was previously identified (11) which was thought to be derived from a minor subset of aggrecan tran- scripts containing a single EGF-like repeat; this identification has been confirmed by cloning and sequencing of the PCR prod- uct (not shown). Surprisingly, this rat aggrecan EGF sequence (40) was more similar to the EGF2 motif in human versican (19) than to the first reported EGF motif in human aggrecan (10) (Fig. 4). The location of this rat EGF-like exon was shown to be 3 kb upstream of the first lectin-like exon of G3 (Figs. 1 and 2) by Southern blotting and by PCR amplification between this exon and the ends of the genomic phage clones A 1 and A31. A second EGF-like sequence was detected in the rat aggrecan gene by low stringency Southern blot hybridizations, using an oligonucleotide probe from the human aggrecan EGF1-like se- quence. The positive 1-kb KpnI-BamHI fragment of A 1 was cloned and sequenced and found to harbor an EGF1-like motif (Fig. 3) about 300 bp downstream from the KpnI site. The deduced translation product of this new sequence is compared by the multiple sequence alignment program PILEUP (37) to other EGF sequences (Fig. 4) including a recently described second expressed EGF motif (EGF2) from human aggrecan (40). The rat aggrecan EGFl sequence shown in Figs. 3 and 4 has undergone loss of 3 of the 6 cysteine residues conserved throughout this motif family, has a 5 amino acid portion de- leted, and has also mutated invariant proline and tyrosine residues. These structural differences make it unlikely that the sequence motif can retain functionality in the protein, and in- deed the 5' intron-exon junction is lacking an appropriate splice acceptor sequence. Furthermore, it has not been possible to detect expression of this exon in rat RNA by specifically primed reverse transcriptase PCR (not shown). As this rat exon is located in an homologous position to the expressed EGFl exon in human aggrecan (40), it appears to be a lost-function

Rat Aggrecan Gene and Promoter 29235

FIG. 2. Summary of exon and intron size and location in the rat aggrecan gene. Exon number is shown with corre- sponding domain (abbreviations dis- cussed in text). The exon size in base pairs and amino acid residues is given along with the sequence surrounding the intron exon boundary (exon sequence in upper case, intron in lower case) and the nucle- otide-numbered position of the exon in the complete cDNA sequence. The intron size in kb is presented, as measured by inter- exon PCR (all but 1 and 13x, which were estimated by restriction mapping). Exon 18 extends indefinitely as 3'-untranslated trailer sequence; the gap shown in the up- per case sequence is at the end of the pub- lished cDNA sequence (5).

EXON DOMAIN SIZE SITE EXON ACCEPTOR EXON DOlooR EXON SITE INTRON

SEQU'JNCE SITE I N CDNA S I Z E (BP/AA) (KB)

1 30.0 1-381 e a a CCACCT TTCMO 381/- 5"UT

2 - 382-458 ePag TaAACT A M CCA Q t C c a g 76/23 M

2.0 P

3 AC CAT Ca-g 384/129 Ql-A D R

AAA Q 1.5 459-842 K

4 1 . 8 843-1017 e m g QC ATT QTC A0 t a a g 174/58 Ql-B 0 1 V R

01-8 A TAC aacag 127/42 Y

QAQ Q 1.4 1018-1145 E

I I I

ACA G I etppg I 1146-1439 1 0.9

I I I

Q3-3' 6799-6937+ CC TQCAT CCCACT cgcag 139+/24 P N

t t g g a g c a t c t c a c c a c c c a p c a t g g a e g g c t g t t c t g p c t c t e t g 60

a g t t C t T O A C O C C T A C C C T T A O O T T C T O T A T A C A O a A a 120

* R L P S R F C I E E P C Q P Q T Y

C ' M a A U I C A T A C T M C O O C C C A C ~ ~ ~ Q - ~ ~ C ~ ~ A ~ a a g c t c 180

L E T Y L R P T Q H S Q Q A C D

tcattagctatgagtagptagacatggtgc 210

FIG. 3. Genomic DNA sequence encoding a second, unex- pressed EGF-like motif in the rat aggecan gene. A rat genomic fragment which hybridized to an oligonucleotide sequence from the human aggrecan EGFl sequence (IO) was cloned and sequenced as described under "Experimental Procedures." The translated open read- ing frame shown in single letter code matched well with the human EGFl amino acid sequence. A stop codon limits the 5'-end of the coding sequence (asterisk), while a consensus splice-donor sequence (box) ap- pears at the 3'-end of the homologous sequence. Two potential 5"splice acceptor sequences are also boxed, but they lie outside the region of homology and upstream of the stop codon.

pseudo-exon in rat; it has accordingly been designated x in Figs. 1 and 2.

Intron Analysis-The exon positions were mapped initially by Southern blot hybridization using exon-specific probes, and most of the intron sizes were then confirmed by inter-exon PCR the exceptions were introns 1, 12, and 13 which a t 30,5, and 3.5 kb were too large (as the ragEGFlX pseudo-exon is not expressed, it is considered to be contained within intron 12). A portion of intron 13 was amplified from total genomic DNA,

Hag1 R a g l x H a g 2 R a g 2 __ Vnl Vn2 PD *I

FIG. 4. Multiple sequence alignment of G.7 EGF motifs. The rat aggrecan EGFl tRag1.x~ and EGF2 (Rag21 (this work, and 40) se- quences were compared using the GCG PILEUP program to the human aggrecan EGFl (HagI) (10) and EGF2 (Hag21 (40) and the human versican EGFl ( V n l ) and EGFZ (Vn2) sequences (19). The order of sequences in the alignment is determined by similarity. Conserved resi- dues throughout the group are boxed with heavy shading. Those resi- dues conserved within the EGFZ group are boxed with light shading, and the EGFI-conserved residues are in stippled boxes. Vnl was placed by the program in the EGF2 group.

using an upstream primer from exon 13 (EGF2) and a down- stream primer from the 5-end of A31, to give a product of 1.1 kb (not shown). This product size was consistent with the non- overlapping phage clones A 1 and A31 being contiguous, with a common terminal EcoRI site, as shown in Fig. 1.

The size of intron 1 was determined from restriction mapping of a single cosmid clone which overlapped the phage clones A5 and A4. The size of the non-overlapping "gap" region between the phage clones was 14 kb in the cosmid clone, and the au- thenticity of this non-overlapping portion of the cosmid clone was supported by Southern blot analysis of total rat genomic DNA, using two different probes within the central part of the

29236 Rat Aggrecan Gene and Promoter

( - 1 19)

tccccctgcgaactggccccgggggtggggtttccctgtgcgctcgcccccacccctagtttgtgccctcccctc CCg -40 I

agggggacgcttgaccggggcccccaccccaaagggacacgcgagc ggggtggggatcaaacacgggaggggag~ U L

?+I L

gc ctatgtatgtgtcaccgcgcaccat ccgc ACCTACCTCCCCGCGGTGCCAGAGGGGCTCACAGAGCTGA ~gatacatacaca~tggcgcgtggta~TGGATGGAGGGGCGCCACGGTCTCCCCGAGTGTCTCGACT +40

I GGACGC~TCCAGTACCGCCCAAG~CCCTCATTCTCCGCGGCTAGCATCTGACGCAGGTGCCGAGGGMCTCGGGGCACC CCTGCGCACGTCATGGCGGGTTCC~GGAGTMGAGGCGCCGATCGTAGACTGCGTCCACGGCTCCCTTGAGCCCCGTGG +’ 2o

~CTGTGTCCCTTCCCTCCATCCACCCTGCTCCGAAACCCAGCCAGCCAGCTACCACTGCGGCCACCGCCCGAGGGGAC AAAGACACAGCCAAGGGAGGTAGCTGGGACGAGGC~CGGCGGTCGATGGTGACGCCGGTGGCGGGCTCCCCTG +200

4 TTGCGGAGGAGACGCCGGCGGGAGGAGGGCTTCGCAGCCCCCACCCCCACCCCCACCCCCAGCCCAGAGCATCCCTGCTG MCGCCTCCTCTGCGGCCGCCCTCCTCCCGMGCGTCGGGGGTGGGGGTGGGGGTGGGGGTCGGGTCTCGTAGGGACGAC +280

+ CAGCGCACCMGACCCGGGGCTGCCAGCCCCCAGGMTCCCTAG~GCTTAGCAGGGATMCGGA~GM~CTTGGAG GTCGC~TCGTTCTGGGCCCCGACGGTCGGGGGTCCTTAGGGATCGACGMTCGTCCCTATTGCCTGACTTCMGMCCTC +360

u (+381)

I CAGCGAmCMCTCrrCAAGgtaaogwogggCtC~tCtCtCCa~g C T C G C T C M G T T G A G A A ~ C c a t t t C t t t C C C ~ g ~ ~ g ~ ~ g g t C t C

+409

FIG. 5. DNAsequence of exon 1 and immediate flanking regions. The 1.8-kb EcoRI-BarnHI restriction fragment encompassing exon 1 was sequenced, and the portion shown here includes the complete exon 1 sequence (upper case), along with 120 bp of 5”flanking DNA sequence, and the first 28 bp of intron 1. Numbering is relative to the start of transcription. The first exon boundaries are also marked and numbered + I and +381. Predicted SP1 recognition sequences (41) are marked with shaded boxes, upward brackets denote predicted AP2 sites (42), and the heavy overfine indicates an mK-b site (43) that is conserved in the type I1 collagen promoter (44). The left-hand arrow shows the oligonucleotide primer that was used to map the start of transcription in Fig. 6, while the upward arrowhead points out the Tu91 site which is the 3’-end of the genomic fragment used as probe in the S1 mapping experiment of Fig. 6. The downward arrow marks the PstI site which is the 3’-end of the promoter fragment used in p64OCAT and related constructs. The 5’-end of the untranslated cDNA sequence reported (5) corresponds to nt +254.

14-kb gap region (data not shown). The close agreement of the restriction map of COS77 in the portions which overlap with phage clones and the corroboration of the structure of the 14-kb cosmid non-overlapping region in total genomic DNA indicates that no major rearrangement has occurred in this portion of the cosmid clone. One anomaly was seen in clone A7, which is apparently missing an EcoRI restriction site found in both COS77 and at the end of A l l . It is not known whether this difference may represent a real polymorphism or instead is some type of artifact.

Determination of Exon 1 Boundary-Several cDNA clones for rat aggrecan were isolated which extended 5’ of the translation start site, and these clones all contained varying lengths of the same untranslated leader sequence (5).’ Genomic clones were isolated with probes to this leader sequence, which was found to reside in a single exon. A 1.8-kb EcoRI-BamHI genomic frag- ment from the A5 genomic clone which contained the 5“leader exon was sequenced, and a 529-bp portion encompassing the exon is shown as Fig. 5. The 5‘-end of this exon was mapped by S1 nuclease protection (Fig. 6) using a 1.5-kb genomic TaqI fragment with its 3‘ terminus within the exon. This end-labeled probe when annealed to rat chondrosarcoma mRNA yielded a major 145-bp protected fragment, establishing the end of the exon as shown in Fig. 5 . There is an additional minor protected fragment of 60 bp greater size. Primer extension analysis con- firmed this result and showed there were no additional up- stream exons, since a primer chosen a t +87 (relative to the S1 start site) was extended to an 84-nucleotide cDNA fragment by reverse transcriptase (Fig. 6). While both S1 and primer exten- sion mapping gave two to three products of single base size differences, the major products from each technique map to within three bases of each other.

This high resolution mapping was by necessity performed using primers and fragments within the large first exon, leav- ing the possibility that there could be transcripts from an al- ternate promoter which also spliced out or did not include exon 1 and would not be detected in these experiments. This possi- bility was tested using other primers within downstream exons 2 and 3 for extension analysis, as well as an S1 probe that

spanned cDNA sequence from exons 1-3. Fig. 7 shows two different single primer extension products from Swarm rat chondrosarcoma mRNA consistent with the same major tran- scription start site mapped in Fig. 5 , although these larger products were not sized as precisely. Likewise, the S1 protec- tion analysis using a cDNA rather than genomic probe frag- ment showed complete protection of exons 1-3 by chondrosar- coma RNA, but not by tRNA, indicating there are no other major promoters which exclude exon 1 (Fig. 7).

Examination of the sequence surrounding the putative tran- scription start site (Fig. 5 ) showed neither an apparent TATAA sequence, nor a CCAAT box. There were, however, several GC- boxes (SP1 sites) typical of such TATAA-less promoters (41). Other potential transcription factor binding sites were identi- fied by data base search (TFD, kindly provided by Dr. David Ghosh, NCBI, NIH) including several AP-2 sites (42) and a site for NFK-b (43) a t -90 bp, which interestingly is conserved at the same position in the rat Col2al promoter sequence (44).

Analysis of Promoter FunctionSeveral constructs were pre- pared placing DNA fragments from the transcriptional start region into the pCAT family of reporter vectors (Promega), as summarized in Fig. 8. The initial fragment which was tested, P640, had 640 bp of 5”flanking DNA and 282 bp of the first exon, and this was cloned in forward (+) and reverse (-1 orien- tation in the pCAT-Basic vector. As shown in Table I, the P64OCAT+ construct reproducibly drove higher levels of CAT expression than the (-1 version when transfected into freshly isolated chick embryo sternal chondrocytes. This level of ex- pression is roughly 25% of that obtained using the SV40 pro- moter and enhancer to drive the CAT gene (pCAT-Control). When the SV40 enhancer was included in the construct with the forward-oriented P640 fragment (p640CAT-EN+), the CAT expression was almost 8-fold higher than the pCAT-Control vector. The reverse-orientation promoter in the same enhancer construct (p640CAT-EN-) gave background levels of CAT ac- tivity. Expression levels for both the enhanced and unenhanced promoters were substantially lower in chick embryo tendon fibroblast primary cells than in chondrocytes, relative to the pCAT-Control plasmid in both cell types. Larger fragments

Rat Aggrecan Gene and Promoter 29237

A B 1 12345

A B

1 2 3 1 2 3 4 5

84 145 -

C 145-NT S1-PROTECTED FRAGMENT

4 + -10 0 10 50 80 1 0 0 1 40

A b

84-NT PRIMER EXTENSION

FIG. 6. High resolution mapping of transcriptional start site. A, primer extension mapping of the end of aggrecan RNA transcripts from rat chondrosarcoma. The primer used is shown in Fig. 5 and begins at +84; extension products are in lane 1, and the adjacent lanes are a sequence ladder of the corresponding genomic sequence using the same primer (order ACGT). B, S1 nuclease mapping of exon 1 sequence pro- tected by the aggrecan RNA transcripts from rat chondrosarcoma. The genomic fragment was an end-labeled TaqI 1.5-kb fragment beginning at +145 and extending 5'; lane 1 shows the intact probe. Lanes 2 and 3, probe annealed with 10 pg of tRNA; lanes 4 and 5, probe annealed with 10 pg of poly(A)+ mRNA. Lanes 2 and 4,20 units of S1 nuclease; lanes 3 and 5, 80 units of S1 nuclease. The adjacent sequence ladder size marker is exon 12 template DNA sequenced with a primer beginning 160 nt before the 3'-end of the exon. C summarizes these results sche- matically, with numbers referring to the start of exon 1.

have been tested for promoter activity in this system; the larg- est extends an additional 3.2 kb upstream of p640 but produces similar levels of CAT activity as the shorter construct in trans- fected chick sternal chondrocytes (data not shown).

DISCUSSION This paper reports the first complete description of a gene

from the aggrecan-like family of proteoglycans. The rat aggre- can gene has been cloned from both phage and cosmid libraries, yielding over 80 kb of overlapping genomic clones. The coding sequences are contained in 63 kb and are split into 18 exons, which are rather larger than usual (45) and tend to encode structural or functional domains, with some exceptions. The G1 and G2 B motifs are each split into two exons, unlike the gene for link protein, where these are single exons (46). The Ba and Bb division occurs between sequences coding for 2 cysteines known to be disulfide linked in the protein, and this is a clear example of a structural motif split between exons. The G1-A and the G1- and G2-B' motifs are not split but are in single exons, as seen for the link protein gene. Another similarity to

C S1 -PROTECTED FRAGMENT

+80 +SO8 ' EXON 1 EXON 2' EXON 3 Y.C..Y,, , ,,, / / ,I / /

4 A -1 1

PRIMER EXTENSION +S62

FIG. 7. Wend mapping including downstream exons. A, primer extension products generated using primers in exons 2 and 3. Lane 2, primer from nt +562; lane 3, primer site at +451; lane 1, size markers of end-labeled restriction fragments of 685 and 235 nt. Gel is a 5% acryl- amide, 7 M urea sequencing gel, and arrows are estimated product sizes derived from comparison to markers. B, S1 protection analysis using as probe an end-labeled SacII-Sau96I cDNAfragment spanning exons 1-3, nt +80 to +508. Probe was annealed at 62 "C in S1 hybridization buffer (see "Experimental Procedures") to 15 pg of rat chondrosarcoma poly(A)+ mRNA (lanes 3 and 4 ) or 15 pg of tRNA, lanes 1 and 2. Diges- tion was with 30 units of S1 (lanes 1 and 3) or 90 units of S1 (lanes 2 and 4); undigested probe is in lane 5. Products were run on a 6% acrylamide, 7 M urea sequencing gel, C, schematic summary of these results.

the link protein gene is the size of the first intron, 30 kb or larger for both genes (47). The gene encoding CD44, which also contains a link B motif, follows the aggrecan pattern using two exons to encode this domain, with the division occurring in a similar position, splitting the pair of cysteines (48). The EGF and CRP motifs in G3 are encoded as structural units by single exons, as is seen in the selectins (49, 50). The LEC domain, however, is split into three exons, similarly to the rat asialo- glycoprotein receptor (51), but unlike the selectins, which en- code the lectin domain as a single exon (49, 50).

An extreme example of the correlation of gene product struc- ture-function with exon structure is seen in the nearly 3.5-kb exon 12, which encodes the entire chondroitin sulfate or glyco- saminoglycan attachment region of aggrecan. This region was defined as containing all the serine-glycine sequences in the molecule, which occur in several different repeating patterns (5). This size of exon is quite unusual for an internal exon not containing untranslated sequence (45) and may be related to the extensive rearrangement and duplication thought to be involved in the generation of this array of repeats.

The introns show typical 5"donor (g-t-a/g) and 3'-acceptor (c-a-g) splice sites (52), and for the most part are of the sym- metrical phase 1 type; exceptions are those between the Ba and Bb exons, which are phase 2-2 in both G1 and G2. This phase

29238 Rat Aggrecan Gene and Promoter

p64OCAT+

p640CAT-

p640CAT-EN+

p640CAT-EN-

pCAT-Basic

pCAT-Control FIG. 8. Reporter constructs used in transfection assays of pro-

moter activity. The top line of the diagram shows the 1.8-kb genomic fragment encompassing exon 1, with EcoRI ( E ) , BamHI (B), and PstI (PI sites indicated. The exon is shown throughout the figure as a solid bar. The EcoRI-PstI subfragment was Hind111 linkered and subcloned into a set of CAT reporter vectors. Horizontal arrows denote orientation of the promoter fragment in the reporter constructs, which are com- posed of CAT coding sequence (horizontal striped box). SV40 and pBR322 sequences are shown as light and dark stippled bores, respec- tively. The SV40 promoter is indicated by vertical stripes), and the SV40 enhancer is indicated by diagonal stripes.

TARLE I Promoter-driven CAT activity in transfected cultured cells

Chick sternal chondrocytes (CSC) or chick tendon fibroblasts (FIB) were cotransfected with the indicated constructs and pSVP as internal standard; extracts were prepared and assayed as described under “Ex- perimental Procedures.” Extracts were normalized for P-galactosidase activity, and CAT activity is expressed as percentage conversion of chloramphenical to the acetylated form. Data are averages of triplicate transfections within one experiment.

csc FIB”

p64OCAT+ 26 6

p640CAT-EN+ 792 181

pCAT-Control 100 100

p64OCAT- 11 3

p640CAT-EN- 14 3

CSCb

2.4 -c 0.4 0.26 * 0.05

p64OCAT+/p64OCAT- p640CAT+/pCAT-ControI

Cat activities expressed relative to pCAT-Control activity in the same experiment.

p64OCAT+ activity expressed relative to either p64OCAT- or pCAT- Control in chick sternal chondrocytes; mean and standard error of these ratios for three different experiments are shown.

change requires the flanking exons to be spliced together or result in loss of the reading frame. A similar situation is found in the set of three G3 lectin exons, where intron 15, between exons LECb and LECc, is of the symmetrical phase 0-0, again requiring concerted splicing or loss of frame. Thus, in the few cases in this gene where a structural or functional unit is bro- ken into multiple exons, the phases of the introns require the sets of exons to be spliced as a unit. The phase of the intron following the putative pseudo-exon 13x would result in loss of frame if the exon were not spliced out, supporting the conten- tion that EGFl is not expressed in rat aggrecan.

The EGF-like motif expressed at low levels in rat aggrecan is of the EGFB class. This same rat EGFB sequence was recently reported as part of another study on alternative splicing of G3 in different species (40). An EGF1-encoding exon in the rat aggrecan gene has undergone extensive mutation and appears to be a non-functional relic (Figs. 3 and 4); we have referred to this as a pseudo-exon 13x. Two human aggrecan EGF exons were mapped by PCR to positions corresponding to those of the rat aggrecan exons 13x and 13 (40). Human aggrecan tran- scripts more frequently possess an EGF1-like sequence than the EGFB motif and can express them together a t low levels

(40). The aggrecan, versican, and neurocan family members thus share the tandem G3-EGF exons at the gene structure level, if not usually at the protein level. There are pronounced species differences in the expression of these motifs for the aggrecan gene, however. The two types of EGF repeats are structurally distinct, with the EGFB type potentially capable of binding calcium (53 and references therein); i t is also the most conserved of the two EGF types.

Exon 18 contains the final 24 amino acids of the G3 domain, the termination codon, and an undetermined length of 3’-un- translated trailer. The polypeptide sequence encoded by this exon is not part of the CRP-like motif and has no recognized homology to known sequences. The size of the 3’-untranslated sequence is likely quite large, since the rat aggrecan tran- scripts on Northern blots measure about 8.2 and 8.9 kb (5),1-2 kb larger than the cDNA sequence (5 , and this report). There is an (A), sequence about 120 bp 3’ of the termination codon but no recognizable (AATAAA) polyadenylation signal in up to 170 bp sequenced 3’ of the termination codon (not shown).

The first exon is of special interest, since mapping i t defines the start of transcription. Although numerous cDNA clones for rat aggrecan were obtained with portions of exon 1 sequence at the 5’-end ( 9 , the longest was still about 65 bp short of the 5’-end reported here.2 The transcription start site was estab- lished by a series of primer extension and S1 protection experi- ments from within exon 1, and both techniques agree to within 3 bp. Both techniques also indicate that the major start site is a region of 2-3 bp in which initiation can occur, since two to three closely spaced bands are seen in Fig. 6. Furthermore, there appears to be a second, minor start site about 60 bp further 5‘; this is seen more clearly in the S1 protection in Fig. 6B, but a diffuse band in the same size region is also seen by primer extension in Fig. 6A. Reverse transcriptase PCR experi- ments have also provided evidence that the minor promoter site is utilized (data not shown). Thus the definition of the starting nucleotide for exon 1 is somewhat arbitrary; the major band in the primer extension experiment was selected, since it appears adjacent to a sequence ladder using the same primer, and transcriptional start sites occur most frequently at A nucleo- tides (54). This designation is also the most 3’ of the major start sites. It is noteworthy that the mouse aggrecan gene, which has recently been characterized,3 and is similar in many respects to the rat gene, has a promoter sequence well conserved with that of rat aggrecan, yet appears to initiate transcription primarily from a position corresponding to the upstream, minor rat promoter.

While these start site determinations were all consistent, it was considered possible that an alternate promoter down- stream of exon 1, or a different upstream promoter coupled with splicing out of exon 1, would not have been detected in these experiments. Complex alternative splicing has been re- ported as a feature of the chicken link protein exon 1 (47). The experiments in Fig. 7 are strong evidence against such possi- bilities, since primer extension from exons 2 and 3 give single products close to the predicted sizes, and a cDNA probe con- taining sequence from exons 1-3 was completely protected by chondrosarcoma RNAfrom S1 digestion. If there were substan- tial numbers of transcripts not containing exon 1, a smaller S1 product would have been detected.

The sequence in Fig. 5 shows this promoter to be of the TATA-less class, and accordingly several SP1-binding sites are found in the vicinity of the start site, as is commonly seen with such promoters (41). It is also common for promoters lacking the TATA signal to show multiple start sites for transcription,

H. Watanabe, L. Gao, S. Sugiyama, K. Doege, K. Kimata, and Y. Yamada, manuscript submitted for publication.

Rat Aggrecan Gene and Promoter 29239

consistent with the 2-3 bp "range" detected for the start site of aggrecan transcripts. The 120 bp of 5"flanking sequence is 72% G+C, and contains four potential AP-2 sites (421, two of which overlap with potential SP1 sites. The first exon sequence is also GC-rich at 66% and also contains a cluster of four overlapping potential AP-2 sites. The overlined sequence at -105 to -84 is of particular interest, since it is the most conserved sequence (81%) in a comparison with the rat type I1 collagen (COL2A1) promoter (441, where it occurs in the same relative position (-123 to -103). This sequence is also a potential binding site in the aggrecan promoter (but not the COLZA1 promoter) for NFK-b, a factor known to interact with various cytokines (431, which may also play a role in chondrocyte gene regulation. The link protein gene promoter (56) also has conserved sequences with the aggrecan promoter, particularly the region -81 to -37 in the aggrecan gene, which is 73% conserved with -74 to -37 of the link protein gene, excluding a 7-nt gap. This region in aggrecan has two predicted binding sites for the AP-2 factor, while these predicted sites are not conserved in this region of the link protein promoter. There are no sequences particularly conserved among all three of the gene promoters. Both the aggrecan and COLZA1 promoters, but not that of link protein, contain numerous clusters of SP1 sites. The potential binding sites for transcription factors have not yet been shown to func- tion in aggrecan gene regulation. All three of these genes are expressed predominantly in cartilage but have different extra cartilage patterns of expression, aggrecan being the most re- stricted (13-15). Comparing the regulatory elements common to these genes may give clues regarding early chondrocyte dif- ferentiation signals.

The function of this putative aggrecan gene promoter has been investigated in transient transfection assays. A DNA frag- ment containing 640 bp of 5"flanking sequence and 282 bp of exon 1 sequence was reproducibly almost 2.5 times more active a t driving CAT expression in chick chondrocytes than its re- verse orientation, and the level of CAT expression was 25% of that driven by the strong SV40 promoter/enhancer combination in chondrocytes. While this promoter is also more active than its reverse orientation in fibroblasts, the level of CAT activity produced is more than 4-fold lower when compared with the SV40 control promoter/enhancer plasmid. Similar results are obtained with this fragment cloned in both orientations in the CAT reporter vector containing the SV40 enhancer (but no SV40 promoter): the expression by the forward orientation pro- moter is enhanced about 30-fold, but the reverse orientation promoter still drives only background levels of CAT activity despite the presence of the enhancer. The enhanced expression is over 4-fold higher in chondrocytes than in fibroblasts, rela- tive to the SV40 control plasmid.

These data indicate that the p640 fragment is a functional promoter. It is not clear if this fragment contains tissue-specific control elements, since the observed higher expression in chon- drocytes relies on the assumption that the reference plasmid, pCAT-Control, shows constant promoter strength in the two cell types, and only responds to nonspecific factors such as transfection efficiency and cell number. It should be noted that the p640-En construct contains the same enhancer element as pCAT-Control and still shows the same degree of stimulation relative to the control in chondrocytes as does the unenhanced p640. This would suggest there are no cell-specific effects on the SV40 enhancer at least. The strong enhanceability of the p640 fragment by the heterologous SV40 enhancer suggests the pos- sibility of an endogenous enhancer element in the aggrecan gene, although this observed effect by a general enhancer may be fortuitous, and such an element has so far not been identi- fied. Definitive evidence of such cell-specific elements in the

aggrecan gene will depend on the demonstration of loss or gain of function in constructs with the elements present or absent. Furthermore, the regulation of endogenous aggrecan gene ex- pression at the transcriptional level has not been characterized quantitatively, either in vivo or in different cell types in culture. It is assumed that fibroblasts are transcribing the aggrecan gene at a much lower level than are primary chondrocytes, based on the steady state levels of aggrecan mRNA in these cells (41, but the relative contribution of new transcription and message stabilization to aggrecan steady state mRNA levels has not been evaluated for these cells. The chick primary chon- drocyte system was used in the characterization of the promot- ers for the rat type I1 collagen and link protein genes (55, 561, making this the most appropriate system in which to initiate comparable studies on aggrecan. We are currently seeking to establish a cell culture system with a known degree of tran- scriptional activation of the endogenous aggrecan gene relative to other chondrocyte-expressed genes so that we may have a base line to help identify aggrecan DNA sequences which confer the appropriate regulation in transfection studies.

REFERENCES 1. Wight, T. N., Heinegard, D. K., and Hascall, V. C. (1991) in Cell Biology ofthe

2. Kimura, J. H., Shinomura, T., and Thonar, E. J.". A. (1987) Methods Extracellular Matrix (Hay, E. D., ed) pp. 45-78, Plenum Press, New York

3. Yamagata, M., Shinomura, T., and Kimata, K. (1993) Anat. Embryol. 187, Enzymol. 144, 372-393

433-444

Biol. Chem. 261,8108-8111

Biol. Chem. 262, 17757-17767

Sci. U. S. A. 83, 5081-5085

4. Doege, K., Fernandez, P., Hassell, J. R., Sasaki, M., and Yamada, Y. (1986) J.

5. Doege, K., Sasaki, M., Horigan, E., Hassell, J. R., and Yamada, Y. (1987) J.

6. Sai, S., Tanaka, T., Kosher, R. A,, and Tanzer, M. L. (1986) Proc. Natl. Acad.

7. Chandrasekaran, L., and Tanzer, M. L. (1992) Biochem J. 288,903-910 8. Oldberg, A.,Antonsson, P., and Heinegard, D. (1987)Biochem. J. 243,255-259 9. Antonsson, P., Heinegard, D., and Oldberg, A. (1989) J. Biol. Chem. 264,

10. Baldwin, C. T., Reginato, A. M., and Prockop, D. J. (1989) J. Biol. Chem. 264,

11. Doege, K., Sasaki, M., Kimura, T., and Yamada, Y. (1991) J. Biol. Chem. 266,

16170-16173

15747-15750

894-91~ 12. Walcz, E., Deak, F., Erhardt, P., Coulter, S. N., Fulop, C., Horvath, P., Doege,

~~. ."

K.. and Glant. T. (1994) Genomics 22. 364-371 I

13. Stirpe, N. S., and Goetinck, P. F. (1989) Development 107, 22-33 14. Wong, M., Lawton, T., Goetinck, P. F., Kuhn, J. L., Goldstein, S. A,, and

15. Kosher, R. A., and Solursh, M. (1989) Deu. Biol. 131,558-566 16. Oxford, J. T., Doege, K. J., Horton, W. E., and Moms, N. P. (1994) Exp. Cell Res.

17. Sandell, L. J., Morris, N., Robbins, J. R., and Goldring, M. B. (1991) J. Cell

18. Boyd, C. D., Pierce, R. A,, Schwarzbauer, J. E., Doege, K., and Sandell, L.

19. Zimmerman, D. R., and Ruoslahti, E. (1989) EMBO J . 8, 2975-2981 20. Rauch, U., Karthikeyan, L., Maurel, P., Margolis, R. U., and Margolis, R. K.

21. Yamada, H., Watanabe, K., Shimonaka, M., and Yamaguchi, Y. (1994) J. Biol.

22. Kimata, K., Oike, Y., Tani, K., Shinomura, T., Yamagata, M., Uritani, M., and

23. Drickamer, K. (1988) J. Biol. Chem. 263, 9557-9560 24. Hourcade, D., Holers, V. M., and Atkinson, J. P. (1989) Advances in

25. Gray, A,, Dull, T. J., and Ullrich, A. (1983) Nature 303, 236240

27. Korenberg, J. R., Chen, X. N., Doege, K., Grover, J., and Roughley, P. J. (1993) 26. Springer, T. A. (1990) Nature 346,425-434

Genomics 16,546-548 28. Finkelstein, J. E., Doege, K., Yamada, Y., Pyeritz, R. E., Graham, J. M.,

Moeschler, J. B., Pauli, R. M., Hecht, J. T., and Francomano, C . A. (1991)

29. Li, H., Schwartz, N. B., and Vertel, B. M. (1993) J. Biol. Chem. 268, 23504- Am. J. Hum. Genet. 48, 97-102

23511 30. Watanabe, H., Kimata, K., Line, S., Strong, D., Gao, L., Kozak, C. A., and

31. Tanaka, T., Har-el, R., and Tanzer, M. L. (1988) J. Biol. Chem. 263, 15831- Yamada, Y. (1994) Nature Genetics 7, 154-157

15835

33. Doege, K., Rhodes, C., Sasaki, M., Hassell, J., and Yamada, Y. (1990) in 32. Doege, K., Sasaki, M., andYamada,Y. (199O)Biochem. SOC. Dans. 18,200-202

Collagen Genes: Structure, Regulation and Abnormalities (Boyd, C., Byers,

34. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A P., and Sandell, L., eds) pp. 137-155, Academic Press Inc., Orlando, FL

Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY

Bonadio, J. (1992) J. Biol. Chem. 267,5592-5598

213,28-36

Biol. 114,1307-1319

(1993) Matrix 13,457469

(1992) J. Biol. Chem. 267, 19536-19547

Chem. 269, 10119-10126

Suzuki, S. (1986) J. Biol. Chem. 261, 13517-13525

Immunology 45 (Dixon, F. I. ed) pp. 381416, Academic Press, New York

29240 Rat Aggrecan Gene and Promoter 35. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proe. Natl. Acad. Sci. U. S. A

36. Deininger, P. G. (1983) Anal. Biochem. 129,216-223 37. Devereux, J., Haeberli, P., and Smithies, 0. (1984) Nucleic Acids Res. 12,

74,5463-5467

387-395 38. Chomczynski, P., and Sacchi, N. (1987)Anal. Biochem. 162,156-159

40. Fulop, C., Walcz, E., Valyon, M., and Glant, T. (1993) J. Bid. Chem. 268, 39. Sharp, P. A. (1981) Cell 23,643-646

17377-17383 41. Kageyama, R., Merlino, G. T., and Pastan, I. (1989) J. Biol. Chem. 264,15508-

42. Imagawa, M., Chiu, R., and Karin, M. (1987) Cell 51,251-260 43. Lenardo, M. J., and Baltimore, D. (1989) Cell 68, 227-229 44. Kohno, K., Sullivan, M., and Yamada, Y. (1985) J. Biol. Chem. 260, 4441-

45. Hawkins, J. D. (1988) Nucleic Acids Res. 16, 9893-9908 46. Rhodes, C., Doege, It, Sasaki, M., and Yamada, Y. (1988) J. Biol. Chem. 263,

15514

4447

60634067

47. Deak, F., Barta, E., Mestric, S., Biesold, M., and Kiss, I. (1991) Nucleic Acids Res. 19,4983-4990

48. Screaton, G. V., Bell, M., V., Jackson, D. G., Cornelis, F. B., Gerth, U., and Bell, J. I. (1992) Proc. Natl. Acad. Sci. U. S. A. 89,12160-12164

49. Johnston, G. I., Bliss, G. A,, Newman, P. J., and McEver, R. P. (1990) J. Biol. Chem. 266,21381-21385

50. Collins, T., Williams, A,, Johnston, G. I., Kim, J., Eddy, R., Shows, T., Gimbrone, M. A,, Jr., and Bevilacqua, M. P. (1991) J. Biol. Chem. 266,

51. Leung, J. O., Holland, E. C., and Drickamer, K. (1985) J. Biol. Chem. 260, 2466-2473

52. Breathnach, R., and Chambon, P. (1981)Annu. Reu. Biochern. 50,349383 12523-12527

53. Campbell, I. D., and Bork, P. (1993) Cur): Opin. Struct. Biol. 3, 385-392

55. Horton, W., Miyashita, T., Kohno, K., Hassell, J. R., and Yamada, Y. (1987) 54. Penotti, F. E. (1990) J. Mol. Biol. 213, 37-52

56. Rhodes, C., Savagner, P., Line, S., Sasaki, M., Chirigos, M., Doege, K., and Proc. Natl. Acad. Sci. U. S. A. 84,8864-8868

Yamada, Y. (1991) Nucleic Acids Res. 19, 1933-1939