structural organization of the bovine thyroglobulin gene and of its 5′-flanking region

9
Eur. J. Biochem. 164, 591 -599 (1987) 0 FEBS 1987 Structural organization of the bovine thyroglobulin gene and of its 5’-flanking region Guy de MARTYNOFF’, Viviane POHL3, Luc MERCKEN I, Gert-Jan van OMMEN4 and Gilbert VASSART’.’ Institut de Rccherche Interdisciplinaire, Service de Chimie and Service d’Histologie, Faculte de Medecine, Universite Libre de Bruxelles Department of Anthropogenetika, University of Leiden (Received December 29, 1986) - EJB 86 1401 The structural organization of the bovine thyroglobulin gene has been investigated by a combination of Southern genomic blotting and direct analysis of cloned gene fragments isolated from a chromosomal DNA library. The entire locus is spread over more than 200000 base pairs which makes it one of the largest eukaryotic genes studies to date. The coding information is scattered into at least 42 exons, 34 of which have been precisely identified. A different evolutionary origin of the 5’ and 3’ regions of the gene is supported by the highly different proportion of exonic material they contain (12% and 3%, respectively) and by the existence of sequence homology between the 3’ region of thyroglobulin and acctylcholinesterase. Detailed sequence analysis of the 5’ region of the gene and its flanking segment demonstrated that a significant homology exists between bovine and human thyroglobulin sequences, except for the presence within the ruminant promoter region of a 220-base-pair sequence belonging to the bovine monomer repeated family. Before their release in the peripheral blood stream, thyroid hormones are synthesized and stored within a thyroid-specific precursor protein, thyroglobulin [l]. This exceptionally large glycoprotein (M, 660000; 2 x 2700 amino acids) contains two identical polypeptide chains translated from an 8.4-kb mRNA [2]. Under optimal conditions, each subunit will couple a fraction of its iodinated tyrosyl residues in order to generate only a few hormonally active tri- and tetraiodothyronines. From this point of view, thyroid hormone biosynthesis appears as a wasteful phenomenon. However, even in conditions of severe iodine deficiency, thyroglobulin is able to yield significant amounts of thyroid hormones [3]. This observation suggests that specific regions of this very large molecule are involved in hormonogenesis. Recently, determi- nation of the complete nucleotide sequence of the bovine thyroglobulin messenger RNA and the precise localization of four important hormonogenic domains in the polypeptide chain [4] have clarified this situation. In the present study, we have taken advantage of the availability of the cloned thyroglobulin cDNA [5 - 71 to in- vestigate the structure of the bovine thyroglobulin gene and of its 5’-flanking sequence. EXPERIMENTAL PROCEDURES Preparation of thyroglobulin DNA probes The construction of recombinant plasmids containing 99% of bovine thyroglobulin mRNA sequences has been de- scribed elsewhere [S, 61. The probes were obtained by complete Correspondence to G. Vassart, Institut de Recherche In- terdisciplinaire (I RI BHN), Campus Erasme, Universitt- Libre de Bruxelles, 808, Route de Lennik, B-1070 Brussels, Belgium Abbreviation. SSC, standard saline/citratc = 0.15 M sodium chlo- ride, 0.01 5 M trisodium citrate, pH 7.0. digestion of these clones and the purification of the cDNA insertion by electroelution from agarose gels. Nick translation was performed according to Rigby el al. [a]. Southern blotting analysis ojgenomic DNA Chromosomal DNA was prepared from individual fresh livers, frozen and crushed in liquid nitrogen, essentially as described in [9]. Restriction digests of total bovine DNA (S- 10 pg) were electrophoresed overnight on a 1% agarose gel and transferred onto nitrocellulose filter (Schleicher & Schull, BA85) by Southern blotting in ammonium acetate [lo]. Hybridizations in the presence of dextran sulfate [ll] and washing of the filters were performed according to [12]. In some instances, the filters were reused after elimination of the probe in 0.03 M NaOH (twice for 10 min at SOT) followed by neutralization in 3 x SSC (three times for 10 min at room temperature). Construction of a bovine genoniic library We used the cosmid vector pJBF, conferring ampicillin resistance and constructed from cosmid pJB8 (a pBR322 de- rivative). Equal amounts of pJBF were digested either with ClaI or with BstEII, treated by phosphatase and finally re- cleaved with BamHI. The vector arms were purified by sucrose gradient in order to obtain fragments with a ligatable BamHI hemi-site either rightwards or leftwards from the COS se- quence and a non-ligatable (dephosphorylated) end at the other side. Both preparations of cosmid fragments were ligated with the same amount of semi-randomly generated 30 - 50-kb bo- vine DNA. These very large fragments of chromosomal DNA were prepared from frozen bovine liver 1131, partially digested with MboI and size-fractionated on a sucrose gradient.

Upload: guy-de-martynoff

Post on 02-Oct-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Eur. J. Biochem. 164, 591 -599 (1987) 0 FEBS 1987

Structural organization of the bovine thyroglobulin gene and of its 5’-flanking region Guy de MARTYNOFF’, Viviane POHL3, Luc MERCKEN I , Gert-Jan van OMMEN4 and Gilbert VASSART’.’

’ Institut de Rccherche Interdisciplinaire, Service de Chimie and Service d’Histologie, Faculte de Medecine, Universite Libre de Bruxelles Department of Anthropogenetika, University of Leiden

(Received December 29, 1986) - EJB 86 1401

The structural organization of the bovine thyroglobulin gene has been investigated by a combination of Southern genomic blotting and direct analysis of cloned gene fragments isolated from a chromosomal DNA library. The entire locus is spread over more than 200000 base pairs which makes it one of the largest eukaryotic genes studies to date. The coding information is scattered into at least 42 exons, 34 of which have been precisely identified. A different evolutionary origin of the 5’ and 3’ regions of the gene is supported by the highly different proportion of exonic material they contain (12% and 3%, respectively) and by the existence of sequence homology between the 3’ region of thyroglobulin and acctylcholinesterase. Detailed sequence analysis of the 5’ region of the gene and its flanking segment demonstrated that a significant homology exists between bovine and human thyroglobulin sequences, except for the presence within the ruminant promoter region of a 220-base-pair sequence belonging to the bovine monomer repeated family.

Before their release in the peripheral blood stream, thyroid hormones are synthesized and stored within a thyroid-specific precursor protein, thyroglobulin [l]. This exceptionally large glycoprotein ( M , 660000; 2 x 2700 amino acids) contains two identical polypeptide chains translated from an 8.4-kb mRNA [ 2 ] . Under optimal conditions, each subunit will couple a fraction of its iodinated tyrosyl residues in order to generate only a few hormonally active tri- and tetraiodothyronines. From this point of view, thyroid hormone biosynthesis appears as a wasteful phenomenon. However, even in conditions of severe iodine deficiency, thyroglobulin is able to yield significant amounts of thyroid hormones [3]. This observation suggests that specific regions of this very large molecule are involved in hormonogenesis. Recently, determi- nation of the complete nucleotide sequence of the bovine thyroglobulin messenger RNA and the precise localization of four important hormonogenic domains in the polypeptide chain [4] have clarified this situation.

In the present study, we have taken advantage of the availability of the cloned thyroglobulin cDNA [5 - 71 to in- vestigate the structure of the bovine thyroglobulin gene and of its 5’-flanking sequence.

EXPERIMENTAL PROCEDURES

Preparation of thyroglobulin D N A probes

The construction of recombinant plasmids containing 99% of bovine thyroglobulin mRNA sequences has been de- scribed elsewhere [S , 61. The probes were obtained by complete

Correspondence to G. Vassart, Institut de Recherche In- terdisciplinaire (I RI BHN), Campus Erasme, Universitt- Libre de Bruxelles, 808, Route de Lennik, B-1070 Brussels, Belgium

Abbreviation. SSC, standard saline/citratc = 0.15 M sodium chlo- ride, 0.01 5 M trisodium citrate, pH 7.0.

digestion of these clones and the purification of the cDNA insertion by electroelution from agarose gels. Nick translation was performed according to Rigby el al. [a].

Southern blotting analysis ojgenomic D N A

Chromosomal DNA was prepared from individual fresh livers, frozen and crushed in liquid nitrogen, essentially as described in [9]. Restriction digests of total bovine DNA (S- 10 pg) were electrophoresed overnight on a 1% agarose gel and transferred onto nitrocellulose filter (Schleicher & Schull, BA85) by Southern blotting in ammonium acetate [lo].

Hybridizations in the presence of dextran sulfate [ l l ] and washing of the filters were performed according to [12]. In some instances, the filters were reused after elimination of the probe in 0.03 M NaOH (twice for 10 min at SOT) followed by neutralization in 3 x SSC (three times for 10 min at room temperature).

Construction of a bovine genoniic library

We used the cosmid vector pJBF, conferring ampicillin resistance and constructed from cosmid pJB8 (a pBR322 de- rivative). Equal amounts of pJBF were digested either with ClaI or with BstEII, treated by phosphatase and finally re- cleaved with BamHI. The vector arms were purified by sucrose gradient in order to obtain fragments with a ligatable BamHI hemi-site either rightwards or leftwards from the COS se- quence and a non-ligatable (dephosphorylated) end at the other side.

Both preparations of cosmid fragments were ligated with the same amount of semi-randomly generated 30 - 50-kb bo- vine DNA. These very large fragments of chromosomal DNA were prepared from frozen bovine liver 1131, partially digested with MboI and size-fractionated on a sucrose gradient.

592

Fig. 1 . Physical map ofthe bovine thyroglobulin gene. The gene is drawn to scale in the 5' to 3' direction. Its structure is schematically represented by an open bar; exons are denoted by filled areas and introns by open areas. The two diagonal lines represent a gap of unknown size not present in any of the genomic clones. The recognition sites of four restriction endonucleases are shown in the next series of lines. Asterisks denote sites that are present in the cDNA. The circled EcoRI sites border the 8.8-kb fragment containing the five first exons. The regions corresponding to genomic DNA inserts in five cosmid clones are indicated at the bottom

In vitro packaging in phage lambda particles and transfec- tion were carried out essentially as described by Grosveld et al. [14]. The host strain Escherichia coli 1046 (803, SuIII, Rec A-, t i , mi) was kindly given by Ed Fritsch. Plating on ampicillin plates (30 pg/ml) and screening were performed according to Hanahan and Meselson [15]. in the hybridization conditions described above.

Bulk quantities of selected cosmid DNA were prepared by the lysis procedure as described by Maniatis et al. [I61 without the amplification step in the presence of chloramphenicol.

Restriction mapping of bovine genomic clones

For every isolated thyroglobulin gene fragment, restriction site localization (EcoRI, HindIII and BamHI) was confirmed independently by 'criss-cross' hybridization analysis [I 71, mul- tiple restriction analysis and partial restriction analysis in 0.6% agarose gels followed by Southern blotting. Various probes were prepared by nick translation, full-length cDNA synthesis [18] and primer extension of recombinant single- stranded M13 clones. Filter hybridizations were performed as described previously or, more recently, in Blotto medium [19].

Electron microscopy of nucleic acid hybrids

Bovine 33-S thyroglobulin mRNA (300 ng) was hybrid- ized to cosmid DNA (? 25 ng) in 70% formamide, 100 mM Pipes, 23 mM Tris (pH 7.8), 5 mM EDTA, 500 mM NaCI, 200 mM KC1 for 3 - 24 h at 58 - 60 "C [20]. Nucleic acids were spread [21], examined under a Jeol JEM 100 B (accelerating voltage 60 kV) electron microscope and measured with a Hipad TM digitizer (Bausch and Lomb) connected to a Panasonic JD-850 M computer. Double and single-stranded DNA of 4x174 phage (5386bp) were used as length standards.

D N A sequencing

Standard recombinant DNA manipulations were con- ducted according to Maniatis et al. [16]. DNA nucleotide sequences were determined from MI3 clones using chain- terminator methods [22] and analyzed on a Wang VS 50 A computer using programs written by S. Swillens (unpub- lished).

RESULTS

Southern analysis of the bovine thyroglobulin gene

Recombinant plasmids containing 99% of the 8431 -base thyroglobulin mRNA sequence [5 , 61 were used in order to determine the gross structure of the bovine thyroglobulin gene by Southern blotting. Independent endonuclease analyses (EcoRI, BamHI, HindIII) demonstrated that the coding se- quence in the thyroglobulin locus was scattered amongst at least 30 exons distributed over more than 130 kb. A consider- able asymmetry appeared at the genomic level: the ratio be- tween the size of the probe and that of the corresponding chromosomal DNA varied from less than l j 8 in the 5' half of the gene to more than 1/27 in the 3' region.

Extensive polymorphism seems not to be responsible for this large size since only low-frequency restriction fragment length polymorphism was detected for different restriction enzymes (BamHI, EcoRI, BstEII) in several individual bulls (data not shown).

Isolation of thyroglobulin gene fragments from a bovine cosmid library

A cloning procedure, slightly modified from the original one of Ish-Horowicz and Burke [23], was used to insert 32 - 48-kb segments of bovine genomic DNA between both arms of the cosmid vector pJBF (see Experimental Procedures). From about 300000 colonies, more than 50 positive clones were picked and rescreened with different contiguous exonic probes. Five cosmids were selected to explore in detail 200 kb of genomic DNA. Despite many attempts to isolate the missing chromosomal regions, two gaps were still present for which no cosmid could be found. The gene organization in these regions encoding about 1500 bp of exonic material was inferred from blotting experiments (see below).

Structural organization of bovine thyroglobulin gene

Electron microscopy of mRNA. DNA hybrids and precise restriction mapping of these five cosmids allowed us to draw up the chromosomal organization of more than 160000 bp of the gene, including a large portion (80%) of the coding se- quence (Fig. 1). Beginning 12 kb upstream from the 5' end of the gene, two partially overlapping cosmid clones (CBTI and

593

Table 1. Exon-intron organization of the bovine thyroglobulin gene The size of exons (E) an introns (I) have been estimated by electron microscopy using 4x174 double and single-stranded DNA as length standards (5386 bp) (N = number of samples)

CBT2, Fig.2A and B) contain together two thirds of the mRNA sequence spread over 70000 nucleotides (Table 1). The 27 exons detected in this region are, as partially described for the human gene [24], characterized by their small size (100-200 bp) except for two adjacent ones (exons 9 and 10, 1090 and 550 nucleotides respectively) which are separated by a relatively short intron of 500 bp. An internal clone (CBT5 probed with 0.8-kb cDNA clone; Fig.2C) shows only two small coding sequences bordered by a purely intronic segment larger than 35000 bp. In addition, several genomic clones, covering the 3‘ end and 27 kb of downstream sequence, contain two introns of more than 15000 bp in length (intron -3 and -5).

Hybridization with total genomic DNA reveals the pres- ence of repetitive DNA sequences in almost every intron. The presence of inverted repeats in the 3’ region of the gene is demonstrated by electron microscopy with the formation of two hairpin structures (2270 bp and 1050 bp long) in intron -3 (CBT20; indicated by an arrow in Fig. 2 D). In less stringent reannealing conditions, a secondary structure (160 bp long) presenting a 2200-nucleotide loop can be reproducibly detected 160 bp downstream from the 3‘ extremity of the gene (Fig. 2 E).

A detailed restriction map for the endonucleases EcoRI, BarnHI, Hind111 and SaZI (see Experimental Procedures) was established for the region of the gene contained in our

cosmids. This analysis is in perfect agreement with the data obtained previously by chromosomal Southern studies.

cDNA probes corresponding to the gene segments not included in our cosmids were used in genomic blotting to determine the minimal number of exons contained in these regions. Results from restriction-fragment-length polymor- phism analyzes with ten endonucleases [25] helped us to select enzymes generating a particularly rich pattern of hybridizing bands in these portions of the gene (BumHI, PstI, PvuII, Tuql; results with EcoRI in Fig. 3 D). Eight EcoRI fragments hybridizing to the cDNA probes were identified which have no counterpart in the neighbouring cosmids (Fig. 3). As there is no EcoRI site in this region of the cDNA, this brings to six and two the minimal number of exons encoded within the 5’ and 3’ gaps, respectively. A minimal size for the two gaps of 32 kb and 17 kb may similarly be inferred.

Sequence analysis of the 5’ region of the gene

Cosmid clone CBT1, containing 42000 bp of chromo- somal DNA, was used to characterize more precisely the 5’ end of the gene. An 8.8-kb EcoRI fragment, containing the first five exons, was analyzed by several restriction endonucleases and partially subcloned in M13 phages. The nucleotides sequence of the first two exons and of 900 base pairs preceding the transcription start were determined by the chain terminator method [22] (Fig. 4).

The sequence of the first exon (108 bp) allowed the se- quence of the 41-nucleotide 5’-untranslated portion of the bovine thyroglobulin mRNA to be confirmed on both strands [26]. This exon also encodes the 19-amino-acid hydrophobic signal peptide and the first three residues of the mature protein

The second exon starts within an important domain in- volved in thyroid hormone formation [27]: intron 1 interrupts the Glu codon immediately preceding the very first tyrosine (at position 5 in the mature protein) which corresponds to a major thyroxine-forming residue in bovine thyroglobulin. Furthermore, this coding segment ends in the middle of another thyroglobulin distinctive element, a cysteine-contain- ing sequence repeated ten times in the amino-terminal half of the protein [4].

The only discordance detected between the genomic se- quence and the cloned cDNA is a synonymous G to C trans- version in codon 35 (Glu: GAG -+ GAC).

The region upstream from the transcriptional initiation site reveals the presence of a canonical Goldberg-Hogness box (‘TATAAA’) and a ‘CAT’ homology sequence (TACT’) beginning respectively at positions - 31 and - 82 1281.

In addition to several palindromic sequences, a 38-bp stretch of homo(purine) . homo(pyrimidine) is located within position -556 to -518; this relatively short segment presents a high similarity with the 209-nucleotide homopurine sequence described in the promoter region of the human gene (-512 to -304) [7]. Like its human counterpart, this segment displays hypersensitivity to S1 nuclease when included in a supercoiled recombinant plasmid (data not shown).

Both homopurine stretches show an interesting similar disposition: the bovine one seems to be flanked by a short direct repeat [(A)TTG], itself bounded by a 7-bp terminal palindromic repeat (AGAAACA- - - - -TGTTTCT). This par- ticular structure is however less conserved in the human sequences.

Despite the similarity of both promoter regions downstream from the homopurine stretches and upstream

[261.

594

A

5'-3'

112 13

3'

. . .

- ..._. . . . . . . I 0 ( 1 ) -'.:, i "

. . . . . . ... . . . .. ...

. . . ( : . . . .. . . .. . . . . . . .

. . . . ---- . . . .

0.2pm , . . . . . . .... . c---l

. I '. ,. ....

.: ....." ... . .... . . . . . ...

R N A

genornic D N A

plasmid sequences ( p B R 3 2 2 X PJBF)

Fig. 2.

595

\ downstream pal indromic

Fig. 2. Electron microscopy of hybrids between thyroglobulin m R N A and cloned bovine genomic D N A . Five rccombinant cosmids were heat- dcnatured and reannealed with the 8431-base thyroglobulin mRNA as described in Experimental Procedures. The DNA . RNA hybrids havc been oriented with their 5’side on the left. (A) Cosmid CBTl ; (B) cosmid CBT2; (C) cosmid CBTS; (D) cosmid CBT20; (E) cosmid CBTIS. Triple hybrids were realized with EcoRI-cut pBR322 plasmid in order lo localize the specific pJBF cosmid sequence in the recombinant clone. Accompanying each micrograph is the interpreting tracing

from the 5’-cap site, a 220-bp segment is found in the bovine gene which has no counterpart in the human sequence. This

inserted element is flanked by short direct repeats (GCTC;

found in both genes), internally bordered by the same longer

inverted repeat (GGTTC T); it belongs to the family of bo- vine dispersed repeat units (BMF) [29] as shown by sequence analysis. Characteristics are illustrated in Fig. 5 and its homol- ogy with the bovine monomer repeat prototype is analyzed in Fig. 6.

T

C

DISCUSSION

During these past 15 years, thyroid hormonogenesis has progressively appeared as a wasteful biosynthetic process. In vivo, less than ten iodothyronines are produced within speci- fic domains (‘hormonogenic sites’) of thyroglobulin [3]. Both subunits of this particularly large glycoprotein (2 x 2700 amino acids) are translated from an 8500-base mRNA [2]. With a size in excess of 200000 bp, the bovine thyroglobulin locus is amongst the largest eukaryotic genes. The 8431-base mRNA is encoded in at least 42 exons, the large majority of

which are of an homogeneously small size (100-200 bp). In contrast, the introns show a broad length range, varying from 50 to more than 35000 bp. This chromosomal organization is very similar to that of other giant genes, such as the one coding human clotting factor Vl I I (186 kb, 26 exons, some very large introns) [30].

Transcription of such large genes could take several hours and the correct removal of their intron transcripts should require extreme precision in the splicing machinery. However, the recent study of a hereditary defect of thyroglobulin gene expression in Afrikander cattle led to the demonstration that thyroids from normal animals contain misspliced transcripts as minor mRNA species [31].

As compared with recent data obtained from similar stud- ies of the human and rat thyroglobulin genes [32, 331, the chromosomal organization is remarkably well conserved during the evolution as far as exon number and size are concerned. In contrast, the length of introns (and their base sequences, at least for the first two) show a highcr degree of variation. Paradoxically, the large proportion of intronic material in the thyroglobulin gene is accompanied by a rela- tively low restriction fragment length polymorphism for sev- eral endonucleases [25, 341.

As it has been reported previously 141, protein sequence analyses show that a large portion of bovine thyroglobulin (over 75%) contains three types of cysteine-containing hom- ologous domains. The ten-times repetition of a 60-amino-acid motif (type I) in the amino-terminal half of the protein and the presence of the two other kinds of adjacent repetitive units in the rest of the molecule, clearly indicate that thyroglobulin arose by segmental duplication of a limited number of ancestral smaller sequences. Extensive sequence homology

A. 5 6 7 8 I I 1 -

CBT5 CBT 20 I -__ ’- CBT 2

a b C

P 3 H s 5 C

L - . H I

2 - L.

D.

Fig.3. Blot hybridization analysis of the 3’ half of the bovine thyroglobulin gene. (A) Physical map of the 3’ portion of the bovine thyroglobulin cDNA [6]. Restriction endonuclease sites shown are Hind111 (H), SstI ( S ) an PstI (P). (B) Relative positions of the exon inserts of three cosmid clones and (C) of contiguous cDNA probes used in genomic blots. (D) Southern blot analysis of EcoRI chro- mosomal fragments hybridizing to the five exon probes. Genomic fragments overlapping two adjacent cDNA sequences are indicated by a dash. Gene segments included in our cosrnids are underlined (a, CBT2; b, CBTS; c, CBT20)

1 2 3 4

seems not to exist between thyroglobulin and other protein families except for the recent demonstration of a 30% homol- ogy between acetylcholinesterase of Torpedo californica and the carboxy-terminal portion of thyroglobulin [35,36]. Unfor- tunately, in the absence of data concerning the gene organiza- tion of acetylcholinesterase, it is not yet possible to interpret this finding in the light of the exon shuffling theory 1371. The asymmetrical intronic content displayed by each half of the thyroglobulin locus argues in favor of their different evolutionary origins.

It is interesting to correlate the position of the first two exon borders with the repetitive primary structure of the thyroglobulin amino terminus and with the location of the hormonogenic domain in that region. Intron 1 splits the Glu residue immediately preceding the hormonogenic tyrosine and intron 2 is located in the middle of the first ten-times repeated type I motif. Sequencing and compilation of the 40 intron positions in the thyroglobulin gene will eventually allow the emergence of a convincing evolutionary model for this protein (J. Parma, unpublished results).

Analysis of about 900 bp upstream from the 5’ end of the bovine thyroglobulin gene demonstrates an interesting pattern of homology with the human thyroglobulin promoter region. In addition to a canonical ‘TATA’ box and a potential ‘CAAT’-like element, the 200 nucleotides preceding the trans- criptional start site show remarkable similarities (Fig. 5). The region immediately upstream from the ‘CACT’ box is particu- larly well conserved with a 20-nucleotide perfect match. Current studies suggest that this segment is involved in trans- cription control by CAMP-dependent mechanism (Christophe et al., unpublished data).

Around position -556 to -518, the bovine sequence contains a 38-bp purine stretch interrupted by only three pyrimidine residues. This predominantly homopurine DNA stretch is reminiscent of the 209-nucleotide purine-rich se- quence found between positions -512 and -303 in the 5’- flanking region of the human gene [7]. When subcloned in supercoiled plasmids, the similarity between the bovine and human polypurine clusters is further demonstrated by detec- tion of an S1-nuclease-hypersensitive site in close proximity to the bovine cluster. While the presence of such a site in the promoter area of other eukaryotic genes has been correlated

5 6 7 6 9

-9U0 CCGGCAGIACGGCATAACLAACCTATGCCTACAGCATCCAGGGT~ACG~TGCC~GGATGACGATGAGCGCATTGTTAGATTTCAIA~CGGTGLCT -800 GACTGLGTlAAGTTTARCTGTGATAAACTACCGCATTAAAGCTTLCTGCTGCC~TTGGTTGTCTGACCTCCTGGGALAGAGGGG~CGGGGATGACTA -7OU CGACTATCACTGTGCGTGTGTTTGGCTTATCTCATCAAAATCTCTACATTCTGTGTTAATGGATCTGLCTGTTTTGTTCCCTGLCATATCLTCAT~CCT

“Pu” x -600 AGAATAGTGTCTGCTTCTCTATCAGACTCTAAAGAAA~ATTGC~GGAGGGAAGGAAGGA~ATGGAT~GGAGGGAGGGAG~ATTGTGTTTCTCTCACG -5M GTGGGCLTGAAC~TCTGECCCACCAAGTTGTTAACTTTGGCCTTTALCCLTGAAGATG~TTAIGAAGCCACACLCCCAGTTCTTCCTTGGTGGCTCAGA . .~ . ...~~ .... ~ ~ ~ . ~ . -w TGGTCAAGAATCCACCTGCARTGCGG~AGACCTGGGT~GATCC~T~~GGTTGGGAAGATCCCCTGGAG~GCGAATGGLTALC~CTCCAGTATTCT~GL

-100 CAGAGAAMCAGGGTGG~~~GCTTCCCTGA~~TGC~TGTGGGTGGSGGCTAAGTACCCACAGCAGTGCT~GCCTCCTTGGCCAGAGCCCTAAGGT

TTG GCA TCC GCC ARC ATC TTT ~GGIARGTTCTGACCCTGCGGTCTCAGAGCATCGCGTTGGGAGG~~ACCTCTGAGGLCAACTCCTTGTTAGC~

-3UO CTGGAGAATCCCATGCACAGAGGAGCCTGGCGGGATGCAGTCCATGG~GTCTCAGAGAGTCAGATGT~CTGAGCGACTTT~CACACATT~GTLCCTGG -200 TTCTGCTCCCCTACAGCCTCCACAAGATTTTCACCCCACACTGGCCACATGAGTGTCCTC~GGGGAACAGACGCAGGTGGAGGACCTCCTTGTGAC~G

EXON + 1 ZSGCAGCAGCTTCTRACCCTTCTCCCTGGAAGGCTCCU\ATG GCC CTG GCC CTA TGG GTC TTC GGT CTG CTG GAG T T A ATC E C

M e t A 1 1 L e u A l a Leu T i p V a l Phe G l y L e u L e u LIP L e u l i e CIS

L e u 41. Ser A l a Asn I l e Phe G GGACCTGLCCCAGGGCCATGGAACATTTGTGACCTCATTTAATCCTCAGTGTCCCGAGGTCAGTTATGCACTCTTTC~TTTT~GATGAA~AGACAGAGG GTCTGAGAAGTCACAGAGTCAGTGATC ------- I N m o N 1 ----- GATCAGGAGAGTGATGTCPAACTGTGAGTAGCTGCARAAGGTT GAAATATACTTGCCGATGGTGGTAGAAATAAAACCCGGAAGAGCGTCTGTGTCTTCAGCGAGCCCGTCTC~GTTACTGG~TLCAGGCAGCAGATTCICT

EXON 1

CTCATGAGCGGCATCACTTCCTTTGTGCA~AG TAG CAG GTG GAT GCC CAG CCT CTC CGC CCA TGT GAG CTG CAG AGG GAG AGG l u T y r Gln V a l Asp A l a G I n P r o Leu A r g P r o Cys G l u Leu G I “ A r g Glu A r g

GCT TTT CTG AAG CGA GAA GAC TAC GTC ccc GAG TGC GCC GAG GAT GGC AGC TTC CA~TAAGGCCTCLTCU\GCCATGCW\CC A l a Phe L e u L y i A r g Glu Asp l y r V a l Pro G l n C y r A l a G l v Asp G l y Ser Phe GI

TCATCTGACCTCTGTTGCCCCAATCCATCGTTCCTTCCAGCCAGAACCAAAGAGTTCCCTTCCTTTCCTGC~TTTTTTACTGTPATTGTTTT~ AGGtiTGAI\TGGGGTGGTTGGCCGGCCGGCCAGTATTTACCTTTCCAATGATC --------- INTRON 2 ------.

GAGGGGCCTCCTTAAAACCCATCTTCTTCCTCTCCCCGC~CCCCTCCTTG~CTGACCCT~GACCATGTATCGGAGCATCCCAGACCCGT~~~AG

Fig. 4. Nucleotide sequence of the 5‘ end of the bovine thyroglobulin gene. Negative numbers indicate nucleotide positions upstream from the transcription start site. Amino acids of the two first exons are indicated under their respective codons. The homopurine sequence (“Pu” box), the CAAT box homology and the TATAAA box are boxed. The cap site is indicated by + 1

597

- 5 2 1 -501 0

~ I ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T ~ C T ~ ~ ~ ~ ~ ~ ~ C A A A C A A A ~ ~ A A G ~ A A C ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ A A C A A A C A ~ ~ A C A A A C A C C G

C T T C T C T A T C A G A C T - C T A A G A A A C I T T G C T

- 5 8 7 - -567

A A G C A A C G A A A G A A A C A A G G A A G A A T G A A A G G A A A G A A G G A T A G A A G G G A G A A A G G A A G G A A A G A A G A A A G ~ A A G G A A G A A A G - A G G A G G G A A G G A A G G A G C A T G G A T G A G G A G G G A G G G A G , - -304

" P U " bore.

0 G A A A G A A G C A A G G A G G A A G G A G A A G G A G A A A G G T A G G T G G G G A A G ~ A A A G A A ~ A A A G ~ ~ C T ~ ~ ~ ~ C ~ ~ ~ ~ ~ T ~ ! ~ ~ ~ ~ T ~ hh..

C A T T ~ T G T T T C I C T C A C G G T G G G C C T

-518 - *

-421

G ~ T G G C T C A G A T G G T G A A G A A T C C A C C T G C A A T G C G G G A G A C C T G G G T T T G A T C C C T G G G T T G G G A A G A T C C C C T G G A G A A G G

I BOVINE INSERT

nunan A %ov 1 nc

numan

Bov 1 n r

nuurn

B o v i n e

numan

B o v l n e

B o v i n e

B o v i n e

Human B o v i n e

Human

B o v i n e

Human

B o v i n e

Humar:

B o v i n e

nunin

B o v i n e

G A A T G G C T A C C C A C T C C A G T A T T C T G G A C A A T C C C A T G G A C A G A G G A G C C T G G C G G G A T G C A G T C C A T G G G G T C T C A G

~'~~AG::::F??CC?FTAGFTC

-202

G C C T C C A C A A G A T T T T C A C C C C A

-196

*. I 195 p b /

'. \ ,/' CACT EXON 1 T TaTaAaA ATG

+ 1C1 / HUMAN ' -521

&AT& ATlG& GTTCTGTTC -86 -34 +I + & 2 + I 0 8

-207 -202 -SO1 -304

- 5 6 2 -518 -421 -196

-^A~G ATTG" GTTCT GCTC - 82 -31 +1 t42 +m I

...- + 336

BOVINE _._... J-Z-5-7 - 897 I \ / \

/ \. ,'BOVINE INSERT ~ \ \

\ CACT -= , ' pU Box- /

B M F

220 bp 100 b p

I

Fig. 5 . Homologies between the human and bovine thyroglobulin promoter regions. (A) Nucleotide comparison of the 5'-end sequence of the human thyroglobulin gene with its bovine counterpart. Several gaps in each sequence were introduced to allow maximal alignment. (B) Schematic representation of the relationship between the human and bovine 5'-flanking sequences. Both homopurine stretches (the long human "Pu" box and the bovine shorter sequence) are indicated by a double line. Each poly(purine) segment is bordered by a similar short repeat (ATTG), itself bounded by a palindromic sequence. The 220-bp bovine sequence which has no counterpart in the human sequence ('bovine insert') is boxed; this segment is bordered by a terminal inverted repeat (filled arrows) and surrounded by a similar short direct repeat (open arrows), several tri-, tetra- and pentanucleotide sequences (underlined) are repeated at the 5' end of the insertion and a specific sequence (CCCTAC) is present at its 3' end [41]

598 - 73 b . P . -

C o n s e n s u s S e q u e n c e

B o v i n e I n s e r t

C o n s e n s u s S e q u e n c e

B o v i n e I n s e r t

C o n s e n s u s S e q u e n c e

B o v i n e I n s e r t

T G G C T C A G ~ ~ G G T A A A G ~ ~ T C T G ~ C T ~ C A ~ T G C G G G A G A C C T G G G T T C N A T C C C T G ~ G T T G G G A A I l l I1 I I I!TII I I l1?311 :I I F I l ? l I I I I1 I I I I I I IIII I I l l II 1 1 I I 1 I l l I I I I

G T T C T T C C T T G G T G G C T C A G A T G G T C A P G A A T C C A C C T G C A A T G C G G G A G A C C T G G G T T T ~ ~ T ~ C C T G G G T T G G G A a

y1

E.U.F. 4- -- G A T C , : C C T G G A G A A G G ~ A A T G G C A A C C C A C T C C A G T A ~ T C T T G C C T G G A ~ A A T C C C A G G G A C G G ~ G G A G C C T G G T ~ G I I 1 1 I 1 I I I I l l I I I I I A l I IIII I I I1 I1 I l l I I IITI I I I I I I l l Ic.1 I l l I l l I I I I I I I I I I I II I I FI ~ ~ T ~ C C C T G G A G A A G G G A A T G G C T A C C C A C T C C A G T A T T C T G G C C T G G A G A A T C C C A T G G A C A G A G G A G C C T G G C G G

‘e -196

Fig. 6. Identification of 220-hp bovine insert. Homology between the 220-bp inserted element and the consensus sequence of the 11 7-bp bovine monomeric dispersed repeat, generally preceded by a common 73-bp sequence (BMF) [29]. Perpendicular lines indicate the same nucleotide as in the consensus sequence, the 3’ ends being flanked by different simple repeats. The internal inverted repeats of the ‘bovine insert’ are denoted by filled arrows and the flanking shorter direct repeats by open arrows. The dotted lines indicate both Sau3A sites. A (CA), cluster is twice underlined

with expression of these genes when detected in intact chromatin [38], no systematic correlation was found in other cases where the S1 nuclease-hypersensif ive sites were mea- sured only in supercoiled recombinant molecules [39].

Interestingly, our bovine purine/pyrimidine stretch is flanked on either side by a short direct repeat and bordered by a seven-nucleotide palindromic sequence. Although a similar arrangement is not found for other homopurine segments, this structure is reminiscent of the sequences present at the borders of inserted transposable elements in both prokaryotes and eukaryotes [40, 411.

A short repeated sequence (Figs 5 and 6) presenting a strong homology with a family of truly transposable elements is found 97 bases 3’ of the purine/pyrimidine stretch. It belongs to the bovine monomeric repeat family (BMF) [29] which is composed of a 1 17-bp monomeric segment generally preceded by a common 73-bp sequence (Fig. 6). Elements of this family have been identified in the bovine adrenocorticotropin//?- lipotropin gene region [42], in the goat PA and 8“ globin genes [43] and in the bovine 1.709-g/cm3 satellite [29].

Knowledge of thyroglobulin gene structure organization and availability of its cloned promoter provide the basis for investigating control of its expression by thyrotropin and CAMP and to understand in molecular terms defective gene expression in hereditary goiters.

The continuous support and critical interest of Dr J . E. Dumont are gratefully acknowledged. We are indebted to Dr Jan de Vijlder for his generous hospitality in the Academical Medical Center of Amslerdam and to Prof. Jean Leloup for allowing us to work in the Luboratoire de Physialogie Ghn&ale et Cornparbe du Museum National d’Histoire Naturelle of Paris. We thank D r F. Baas for familiarizing us with library screening techniques and Mr M. Georges for his collaboration in polymorphism experiments. We are grateful to Mrs G. Pattyn for her excellent technical assistance in electron microscope techniques, Mrs M. J. Simons for her help in sequencing and Mrs P. Miroir for the preparation of the manuscript. This study has benefited from grants of the following organirations: lnstitut pour /’Encourage- ment de la Recherche Scientifique dans l‘lndustrie ei I’Agriculture, Actions ConcertCes, Fonds de la Recherche Scientifique Mhdicale and the National Institutes of Health. G. dM. held a grant from the Commission of the European Community.

REFERENCES 1. Van Hcrle, A. J., Vassart, G. & Dumonl, J . E. (1979) N . Engl. J .

Med. 301, 239-249 and 307-314.

2. Vassart, G . & Brocas, H . (1980) Biochim. Biophys. Acta 610,

3. Rolland, M., Montfort, M. F. & Lissitzky, S. (1973) Biochim. Biophj~. Acta 303, 338 - 347.

4. Mercken, L., Simons, M. J., Swillens, S., Massaer, M. & Vassart, G. (1985) Nature (Lond.) 316, 647-650.

5. Christophe, D., Brocas, H., Cannon, F., de Martynoff, G., Pays, E. & Vassart, G. (1980) Eur. J . Biochem. 111, 419-423.

6. Christophe, D., Mercken, L., Brocas, H., Pohl, V. & Vassart, G . (1 982) Eur. J . Biochem. 122,461 - 469.

7. ChristoDhe. D.. Cabrer. A.. Bacolla. A.. Tareovnik. H.. Pohl.

189 - 194.

8.

9.

10.

1 1 .

12.

13.

14.

15. 16.

17.

18.

19.

20. 21.

22.

23.

24.

25.

26.

V. &\assart, G. (1985) Nucleic Acids Res. 13, 5127-5144. ’

Rigby, P. W. J., Dieckmann, M., Rhodes, C. & Berg, P. (1977) J . Mol. B id . 113. 237 -251.

Pearson, W. R., Wu, J . R. & Bonner, J. (1978) Biochemistry 17,

Smith, G . E. &Summcrs, M. D. (1980) Anal. Biochem. 109,123-

Wahl, G. M., Stern, M. & Stark, G. R. (1979) Proc. Natl Acad.

van Ommen, G. J. B., Arnberg, A. C., Baas, F., Brocas, H., Sterk, A,, Tegclaers, W. H., Vassart, G. & de Vijlder, J. J. M. (1983) Nucleic Acid. Res. 11, 2273-228s.

Blin, N . & Stafford, D. W. (1976) Nucleic Acids Res. 3, 2303- 2308.

Grosveld, F. G., Dahl, H. H. M., de Boer, E. & Flavell, R. A.

Hanahan, D. & Meselson, M. (1980) Gene 10, 63-67. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular

cloning - A lahorntory manual, Cold Spring Harbor Labora- tory, New York.

Sato, S., Hutchinson, C. A. Il l & Harris, J . I . (1977) Proc. Nut1 Acad. Sci. U S A 74, 542 - 546.

de Martynoff, G., Pays, E. & Vassart, G . (1980) Biochem. Biophys. Res. Commun. 93, 645-653.

Johnson, D. A,, Gautsch, J. W., Sportsman, J. R. & Elder, J. H. (1 984) Gene Analysis Techniques I , 3 - 8.

Pohl, V. (1984) lnserm Symp. 8, 309-323. Davis, K. W., Simon, M. & Davidson, N . (1971) Methods

Sanger, F., Nicklen, S. & Coulson. A. R. (1977) Proc. Natl Acad.

Ish-Horowicz, D. & Burke, J . F. (1981) Nucleic Acids Res. 9,

Targovnik, H. M., Pohl, V., Christophe, D., Cabrer, B., Brocas,

Georges, M., Lequarre, A. S., Hanset, R. & Vassart, G . (1987)

Mercken, L., Simons, M.-J., de Martynoff, G., Swillens, S. &

51 -59.

129.

Sci. USA 76, 3683-3687.

(1981) Gent 13,227-237.

Enzymol. 21, 41 3 -428.

Sci. USA 74, 5463 - 5467.

2989 -2998.

H. & Vassart, G. (1984) bur. J . Biochem. 141,271 -277.

h i m . Genet., in the press.

Vassart, G. (1985) Eur. J . Biochem. 147, 59-64.

27. Rawitch, A . B., ChernofT, S. B., Litwer, M. R., Rouse, J . B. & Hamilton, J . W. (1983) J . B id . Chem. 258, 2019-2082.

28. Brcathnach, R. & Chambon, P. (1981) Annu. Rev. Bior.henz. 50,

29. Skowronski, J., Plucienniczak, A., Bednarek, A. & Jaworski, J . (1984) J . Mol. Bid. 177, 399-416.

30. Citschier, J., Wood, W. I., Goralka, T. M., Wion, K. L., Chcn, E. Y., Eaton, D. H., Vehar, G. A., Capon, D. J. & Lawn, R. M. (1984) Nature (Lond.) 312, 326-330.

33. Ricketts, M. H., Pohl, V., de Martynoff, G., Boyd, C. D., Bester, A. J., Van Jaarsveld, P. P. & Vassart, G. (1985) EMBO J . 4,

32. Baas, F.. van Ommen, G. J . B., Bikker, H., Arnberg, A. C. & dc Vijlder, J . J. M. (1986) Nucleic Acids Res. 14, 5171 -5186.

33. Musti, A. M., Avvedimento, E. V., Polistina, C., Ursini, V. M., Obici. S . , Nitsch, L., Cocozza, S. & Di Lauro, R. (1986) Proc. Natl Acad. Sci. USA 83, 323 - 321.

349- 383.

131 -137.

34. Baas, F., Bikker, H., Van Ominen, G. J . B. & de Vijldcr, J. J . 11. (1984) Hum. Genet. 67, 301 -305.

35. Schuniacher, M., Camp, S., Maulct, Y., Newton, M., MacPhee- Quigley, K., Taylor, S. S., Priedniann, T. & Taylor, P. (1986) Nuture (Lond.) 319, 401 -409.

36. Swillens, S., Ludgate, M., Mercken, L., Dumont, J. E. & Vassart, G. (1986) Biochem. Biophys. Res. Commun. 137, 142-148.

37. Gilbert. W. (1985) Science (Wush. DC) 228, 823-824. 38. Larscn, A. & Weintraub, H. (1982) Cell 29, 609-622. 39. Macc, H . A. F., Pelham, H. R . B. & Travers, A. A . (1983) Nature

40. Jelinek, W. R . & Schmid, C. W. (1982) Annu. Rev. Biochom. 51,

41. Streeck, R. E. (1982) Nature (Lond.) 298,161-769. 42. Wdtanabe, Y., Tsukada, T., Notake, M., Nakanashi, S. & Numa,

S. (1982) Nucleic Acids Res. 10, 1459-1469. 43. Schon, E. A., Clearly, M . L., Hayncs, J . R . & Lingrcl, J. R. (1981)

(Lond.) 304, 555 - 557.

8 13 - 844.

C’~.ll?l, 359 - 369.