structure, evolution, and regulation of afast skeletal

5
Proc. Natl. Acad. Sci. USA Vol. 82, pp. 8080-8084, December 1985 Developmental Biology Structure, evolution, and regulation of a fast skeletal muscle troponin I gene (muscle genes/exon organization/functional domains/homologous sequences/coordinate transcription) ALBERT S. BALDWIN, JR.*, ELLEN L. W. KITTLER, AND CHARLES P. EMERSON, JR. Department of Biology, Gilmer Hall, University of Virginia, Charlottesville, VA 22901 Communicated by Oscar L. Miller, Jr., August 8, 1985 ABSTRACT The complete structure of a quail fast skeletal muscle troponin I gene was determined by nucleotide sequence comparison of troponin I genomic and cDNA sequences. This 4.5-kilobase troponin I gene has eight exons. The actin-binding domain of troponin I is encoded by a single exon, whereas the troponin C-binding domain is split into at least two exons. The exon organization of the fast troponin I gene suggests that gene conversion directs the nonrandom conservation of the carboxyl-terminal halves of troponin I isoforms and that the amino-terminal extension of the cardiac isoform originated by splice-junction sliding. Comparison of the structure of the troponin I gene with the structures of other contractile protein genes reveals homologous sequences in their 5' flanking regions and similar large introns that separate protein-coding exons from 5' nontranslated exons. These common structural fea- tures may function to coordinate the activation of contractile- protein genes during myogenesis. Troponin I is a family of three muscle-specific myofibrillar proteins involved in the calcium regulation of contraction in cardiac and in skeletal muscle (1). Troponin I proteins have multiple functional domains that are distinct and bind with high affinity to actin (2) and troponin C (3) (see Fig. 1). The interactions of these domains regulate actomyosin ATPase activity in resting and contracting muscle (4). Troponin I also interacts functionally with other muscle proteins, including troponin T (5). The specific domains involved in these other interactions have yet to be identified. Amino acid sequence studies have revealed that avian and mammalian muscles differentially express three related troponin I isoforms specific to cardiac and to fast and slow skeletal muscles (1). Comparison of their amino acid se- quences indicates that these three troponin I isoforms are encoded by separate genes that arose by gene duplication prior to the divergence of birds and mammals more than 250 million years ago. However, the evolution of these troponin I protein isoforms has been strikingly nonrandom (Fig. 1). Their amino-terminal halves are highly divergent, whereas their carboxyl-terminal halves are highly homologous (1). Furthermore, the cardiac isoform has a 26-residue amino- terminal extension. The functional significance and evolu- tionary origin of the divergent and homologous sequences in troponin I isoforms are unknown. The troponin I gene is one of the set of muscle-specific genes that are coordinately activated during the differentia- tion of embryonic myoblasts (6). A previous report described the isolation and nucleotide-sequence analysis of cDNA clones encoding quail fast muscle troponin I and other regulated muscle protein mRNAs (7). Here we report the complete structure of the quail fast skeletal muscle troponin I gene and its upstream transcriptional promoter. The struc- ture of this troponin I gene now provides a basis for examining the evolutionary origins of the multiple functional domains of troponin I, the evolutionary origins and diver- gence of troponin I protein isoforms, and the molecular basis of the coordinate transcriptional regulation of troponin I and other muscle genes during muscle development. MATERIALS AND METHODS Isolation and Mapping of the Troponin I Gene. The isolation of overlapping quail fast skeletal muscle troponin I cDNA clones has been described (7). One of these clones, cC120, was used as a hybridization probe (8) to screen a genomic library of -20-kilobase (kb) partial EcoRI restriction frag- ments of quail embryo DNA cloned in the A Charon 4A vector (9-12). DNA Sequence Analysis. Genomic and cDNA nucleotide sequences were determined by the method of Maxam and Gilbert (13). Sequences were 95% confirmed by sequencing both DNA strands. DNA sequences were edited and ana- lyzed by use of the Stanford Molgen computer system programs, a VAX 11/750 computer, and the GenBank data base (distributed by Bolt, Beranek and Newman, Inc., Cambridge, MA). Nuclease S1 Mapping. The 5' and 3' gene transcript bound- aries were mapped by nuclease S1 analysis (14), as described (15). RESULTS Cloning of the Quail Fast Skeletal Muscle Troponin I Gene. Partial DNA sequence analysis of three cDNA clones iso- lated from a cDNA library of quail myofiber-specific RNAs revealed that these clones encode the fast skeletal muscle isoform of troponin I (7). The cDNA clones, cC106, cC112, and cC120, are overlapping and encode all of the protein- coding region of the troponin I mRNA as well as 5' and 3' nontranslated sequence. Troponin I genomic clones were isolated by screening an embryonic quail genomic DNA library in X Charon 4A, using cC120 as the probe. One genomic clone recovered in this screen, gClTnI4 (previously referred to as XQETnI4), has a 16-kb genomic DNA insert that includes the complete troponin I gene. The Nucleotide Sequence of the Troponin I Gene. The nucleotide sequence of the region of gClTnI4 homologous to fast muscle troponin I cDNA clones was compared with the nucleotide sequences of the cDNA clones cC106, cC112, and cC120. The sequences of these cDNA clones are overlapping and include 43 base pairs (bp) of 5' nontranslated sequence, the entire 546-bp of fast skeletal TnI protein coding sequence, and 140 bp of 3' nontranslated sequence. Fig. 2 shows the Abbreviations: bp, base pair(s); kb, kilobase(s). *Present address: Center for Cancer Research, Massachusetts In- stitute of Technology, Cambridge, MA 02139. 8080 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Upload: others

Post on 15-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Proc. Natl. Acad. Sci. USAVol. 82, pp. 8080-8084, December 1985Developmental Biology

Structure, evolution, and regulation of a fast skeletal muscletroponin I gene

(muscle genes/exon organization/functional domains/homologous sequences/coordinate transcription)

ALBERT S. BALDWIN, JR.*, ELLEN L. W. KITTLER, AND CHARLES P. EMERSON, JR.Department of Biology, Gilmer Hall, University of Virginia, Charlottesville, VA 22901

Communicated by Oscar L. Miller, Jr., August 8, 1985

ABSTRACT The complete structure of a quail fast skeletalmuscle troponin I gene was determined by nucleotide sequencecomparison of troponin I genomic and cDNA sequences. This4.5-kilobase troponin I gene has eight exons. The actin-bindingdomain of troponin I is encoded by a single exon, whereas thetroponin C-binding domain is split into at least two exons. Theexon organization of the fast troponin I gene suggests that geneconversion directs the nonrandom conservation of thecarboxyl-terminal halves of troponin I isoforms and that theamino-terminal extension of the cardiac isoform originated bysplice-junction sliding. Comparison of the structure of thetroponin I gene with the structures of other contractile proteingenes reveals homologous sequences in their 5' flanking regionsand similar large introns that separate protein-coding exonsfrom 5' nontranslated exons. These common structural fea-tures may function to coordinate the activation of contractile-protein genes during myogenesis.

Troponin I is a family of three muscle-specific myofibrillarproteins involved in the calcium regulation of contraction incardiac and in skeletal muscle (1). Troponin I proteins havemultiple functional domains that are distinct and bind withhigh affinity to actin (2) and troponin C (3) (see Fig. 1). Theinteractions of these domains regulate actomyosin ATPaseactivity in resting and contracting muscle (4). Troponin I alsointeracts functionally with other muscle proteins, includingtroponin T (5). The specific domains involved in these otherinteractions have yet to be identified.Amino acid sequence studies have revealed that avian and

mammalian muscles differentially express three relatedtroponin I isoforms specific to cardiac and to fast and slowskeletal muscles (1). Comparison of their amino acid se-quences indicates that these three troponin I isoforms areencoded by separate genes that arose by gene duplicationprior to the divergence of birds and mammals more than 250million years ago. However, the evolution of these troponinI protein isoforms has been strikingly nonrandom (Fig. 1).Their amino-terminal halves are highly divergent, whereastheir carboxyl-terminal halves are highly homologous (1).Furthermore, the cardiac isoform has a 26-residue amino-terminal extension. The functional significance and evolu-tionary origin of the divergent and homologous sequences introponin I isoforms are unknown.The troponin I gene is one of the set of muscle-specific

genes that are coordinately activated during the differentia-tion of embryonic myoblasts (6). A previous report describedthe isolation and nucleotide-sequence analysis of cDNAclones encoding quail fast muscle troponin I and otherregulated muscle protein mRNAs (7). Here we report thecomplete structure of the quail fast skeletal muscle troponinI gene and its upstream transcriptional promoter. The struc-

ture of this troponin I gene now provides a basis forexamining the evolutionary origins of the multiple functionaldomains of troponin I, the evolutionary origins and diver-gence of troponin I protein isoforms, and the molecular basisof the coordinate transcriptional regulation of troponin I andother muscle genes during muscle development.

MATERIALS AND METHODS

Isolation and Mapping of the Troponin I Gene. The isolationof overlapping quail fast skeletal muscle troponin I cDNAclones has been described (7). One of these clones, cC120,was used as a hybridization probe (8) to screen a genomiclibrary of -20-kilobase (kb) partial EcoRI restriction frag-ments of quail embryoDNA cloned in the A Charon 4A vector(9-12).DNA Sequence Analysis. Genomic and cDNA nucleotide

sequences were determined by the method of Maxam andGilbert (13). Sequences were 95% confirmed by sequencingboth DNA strands. DNA sequences were edited and ana-lyzed by use of the Stanford Molgen computer systemprograms, a VAX 11/750 computer, and the GenBank database (distributed by Bolt, Beranek and Newman, Inc.,Cambridge, MA).

Nuclease S1 Mapping. The 5' and 3' gene transcript bound-aries were mapped by nuclease S1 analysis (14), as described(15).

RESULTS

Cloning of the Quail Fast Skeletal Muscle Troponin I Gene.Partial DNA sequence analysis of three cDNA clones iso-lated from a cDNA library of quail myofiber-specific RNAsrevealed that these clones encode the fast skeletal muscleisoform of troponin I (7). The cDNA clones, cC106, cC112,and cC120, are overlapping and encode all of the protein-coding region of the troponin I mRNA as well as 5' and 3'nontranslated sequence.Troponin I genomic clones were isolated by screening an

embryonic quail genomic DNA library in X Charon 4A, usingcC120 as the probe. One genomic clone recovered in thisscreen, gClTnI4 (previously referred to as XQETnI4), has a16-kb genomic DNA insert that includes the completetroponin I gene.The Nucleotide Sequence of the Troponin I Gene. The

nucleotide sequence of the region of gClTnI4 homologous tofast muscle troponin I cDNA clones was compared with thenucleotide sequences of the cDNA clones cC106, cC112, andcC120. The sequences of these cDNA clones are overlappingand include 43 base pairs (bp) of 5' nontranslated sequence,the entire 546-bp offast skeletal TnI protein coding sequence,and 140 bp of 3' nontranslated sequence. Fig. 2 shows the

Abbreviations: bp, base pair(s); kb, kilobase(s).*Present address: Center for Cancer Research, Massachusetts In-stitute of Technology, Cambridge, MA 02139.

8080

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Developmental Biology: Baldwin et al. Proc. Nati. Acad. Sci. USA 82 (1985) 8081

23 4 5 6 7

Fast NH2-H--if+-II--+ UU+ COOH

Slow NH2-HIH-H'- 11 mI+-f-U -U-U--COOH

Cardiac NH2 IHi :u U H COOH

TnC-binding Actin-bindingdomain domain Ilqa~

FIG. 1. Amino acid sequence comparison of rabbit cardiac and fast and slow skeletal troponin I protein isoforms. These data, based on thestudies of Wilkinson and Grand (1), show troponin I protein sequences aligned for maximum homology. With the exception of two residues,the chicken fast skeletal troponin I sequence is identical to the conserved residues of the three rabbit isoforms. Residues shared by all isoformsare shown as vertical lines, deletions are shown as ( ), and the positions of introns in the quail fast skeletal troponin I gene are numbered andshown as v (see Fig. 3). The actin-binding and troponin C (TnC)-binding domains were established by peptide-affinity binding studies (1).

sequence of 4500 bp of genomic gClTnI4 DNA homologous functional fast skeletal muscle troponin I gene. The quailto the TnI cDNA clones. troponin I protein sequence derived from the nucleotideAn unambiguous exon arrangement of the troponin I gene sequence (Fig. 2) is identical to that of the chicken fast

was derived by comparing genomic DNA and cDNA se- skeletal muscle troponin I protein (1). Genomic blottingquences and by nuclease Si mapping of the 5' and 3' gene experiments showed that the restriction fragments ofboundaries. The quail fast skeletal muscle troponin I gene is troponin I sequences in gClTnI4 match those in quail genomicdispersed in eight exons (Figs. 2 and 3). In all cases introns DNA, indicating that fast troponin I is a, single-.!copy genecontain consensus 5' and 3' splice junctions (16). Genomic (data not shown). A computer search of the protein-codingtroponin I exon sequences and cDNA sequences are identi- potential of the intron sequences of the fast muscle troponincal, establishing that the gClTnI4 genomic segment is a I gene did not reveal additional exons encodling amino acid

CTGCTGGGATAGGCTGGGAATATTCCAGGTCATTCCATCGCCTCTGCTTGGCTGTGGTATGGGCTCTGGGTGACTGCACAGCTTGGGGTGCCTCCATACCAG 120CAGTGGCTCCATAGCTTTGAGTAGTTTTGCTTTTTTTTCTTAATAAAAACTTAACATCTGGTGTTAAAATATTTCACATGG GGAC-GACCAGCAAATAT TTTCCTGCATGCTAATCTGA 238GGTAAAAAGGATCCAGCCTGAATACTCAGAATGCTCTCCACGTGCGCCAGGGCTGATGTTGTGGGATATGCAATGCTGAGGCTCTTCCACCAGTTTCCCCAG 358

GGCAGAGCTGCCTCTCCTGTGGAGCTGCCTAGCCTGCCCCAGCCAAGGGAGGTGCATGGGGGCTCTTGCACA ~~IGGTCTGAAGGGGTCT jC~n GCCC TfA-AAGAGAAGGTGG 472Exonl

GATTTTCTGCTGAGCCACCCGAGAAGCC TTAACCTCCTTCCCCAGCCAGCTGCTT GCGGCTTGCCAGTCAGTTCTGGCCAGCCGGCGGGGCTCTCTGTTCCTTCTCCGGACTCG 589G/GTGGCCCTATGCAGTCTCTACTGCCAGCTGTTCTACCGGGTGGAGGGAGCACTTTGCCACCTAATGTGGGGAGTAAAGCTGGTGAGGAATCTCTTGGG 708C~A-TCGGGTCTGATGGTCACCTCGTCCTTGCTGTCAGGGGTCCGTGATTGGAGCTCGTCTTCCCACTGCCTCTTCCCTTAGTTTATCCCACTTGTCTTG 828CGCTCTCTTAGTTCCCTACTCCTCCTAGAGGAAAAGCATCCCTGCGTGGCACTTGCGTTCTCCCATCGCACCCTCTCTTGCCTCTGTGTTAGCTCTTCCTCC 948

CAAATTGCCCCTCAGCACCTGAGTATCCGT--- 22 bp ---GGACCGTAACTCAAGGGCTGGAGTCCCTTGTACTGACTATACCCTGCTATGGGACTTTTCTGGCATTG lo68CTTCCCACGCCCTTGCGTTTCAGGGATAGGGAATCCAGCAAAATACTCCTGTCCTCATCCGTGTGACTTTGCCACTGTCCCCCACCTCCCGAGTCCGGGGCT 1188

GGCTGCGTCTGAGGAGA--- 23 bp ---CACCTTGTGCAGCTCCCCAGCCATTTCCAGAAGCACTATCAGAGCACTTCCCCCCACCCCTTGCTCTTCCCAGCAATGTG 1308TGTGCCACACATTTCTAGATAAGGTTCCTCGGGAGCTGGCCCGCAGCCTTACCTCTCCTCGGAAACACCCTGTAACATTTGCCTCTAGGTTGCAGACGCTGTT 1428ATTAGGGGTTCATTCTCTTAAGGGATGGCTCATTTGGCTTGACAGTGCCCTGCTTACCCATGATGGTATGAGAGGAAAACTGCTTAATAATAAAGCGGATTTC 1548AGGGAAAACTGGACCGAAGGTAACTTCCCTAGAAATTCTCCAGCCTTTCCAACTAGCAATGCTTTCGCCCATCTTCATATCTGTCTGAGGTTCTCCCTGCCCA 1668TGCAATTACACCATTCGGCAACCTTGTACCTAGGTCATGCCTGCTCTGTCCTTTGCCAAAGACCCCCAGCTTTGCTGATCTTGTCATGACGTGTCGTGCA 17888TTTCCCCTTTGGAGGCAAAAGATTAAGCCTGAGTATCTTGAGCCTCCGAAGCACACAGAGTGAAGTGCTTTTTCTTCCTCTTACTCATCAGTAAAACGCATAC 1908GTCCATCAAGAAAAGCAAGCTATAGAGTACGTCACATCTCTTTTTCAGCCTGCACACTTGTCCGGGCATAGTGGCTGAGCGGCAGTGCTTCCTTCTCAT 20282ATAGATGCAGCTAGGTGGAATGAGATGGAAATGGCAGCGTTTCCTGGCGCAGGGAAGCTGCATCAAGGCCGTGCACATGATAATTTGAATGTTTTGTTC 21484ExonII Met Ser As

GACACAG/GTTTGTAACCATCTAAAGCAAG ATG TCT GA/GTAAGTTCCTATTTTCTGTTTCCAGTATGTCGCTGGTGTTCAAACTTTTGTTCAGCTGCCAGCTTTTACATACATACT 2263ExonIII p Giu Glu

ATTCTGCTTGTTTTCAATGAGGATGCTAACTGCTTTTTTICTCCTCTTTCTCTTCCTTCCCTTCCTTGCCCATCCGCTCTCCCTGCACAG/T GMA GAG/GTAAGTGTGTCCTGCTGGGC 2379TTTTCCCTTTAGCAGAGCTCATAGCATCGATCCTCGCTTTGTAGGAAAAGCAGTTATGGCGTGGTGGGAGGAAGGAATGTCCCATGGTCTGGCTTTCCTGATACCTTCTCGAGAAGMAGA 2499GAAGACGAGCAAATCCTTTTAAGGGCTGCTGCAGTGGATGTTTCCCCTTTCTCGTCCCTACCGCCCGAGGGGTGCGAGAACGCTGCCGAACAAACCAAG 26191CCACACTCAATGCCACAGCGGGATCAAAAATCCCCTATGTGCTATGCCAATGGTGCATGGGGCCCTTGGCCGGTGCGCAGGAGGGTGGGAGGCACCTG 27393ACAAATAGCATTTTCGAAGAGAACGCCACGGCTGGAGACTCGAGAATTGAGCAGATTCTCTGTTAAGGTGCTGATTGTCTAGCTAACAAGATGAGCAG 28595

ExonIV Lys Lys Arg ArgTTGGGACAAATTGTACAGCTGGGGATTGGTGTGAGAGGAAAATGAGGATCCTCATACTACCTGTTTATTTTCCAMGGCAGGAA AGGGGA 29737

Ala Ala Thr Ala Arg Arg Gin His Lou LysGCA GCC ACC GCC CGG CGG CAG CAC CTG AAG/GTACGTGGCCCTGCTGGGGCTGGGCGCGGGTGTGTCCTGATCATCCCATCTCCCACCCAGCAGTTGTCACTCCACCTGCC 3083TCCAAAGCCATGTCGACGTGTGGGTTCCTATACCTTCTCTCACCTACCAGTACACATAAGACGGTGCGGGGGAGCTGGGTCAGCTCATCCGACCTCCCAGCG 3203

ExonV Ser Ala Met LeuTCCCATGATCCCTTTGTCAGGACCTGGATTTGCATGAGACTCCATCTTCTCTTGCCCGCCCTGTTTCTCCTTCGTTGGCTCAGAGTGGCTCTC 33191

Gin Leu Ala Val Thr Glu Ilie Giu Lys Giu Ala Ala Ala Lys Giu Val Giu Lys Gin Asn Tyr Leu Ala Giu His Cys Pro Pro Leu SerCAG CTT GCT GTC ACT GMA ATA GAA AAA GAA GCA GCT GCT AMA GAA GTG GMA MG CAA AAC TAC CTA GCA GAG CAT TGC CCT CCT CTG TCC 3409Leu Pro Gly Ser Met Gin Glu Leu GinCTC CCA GGA TCC ATG CAG GAA CTT CAG/GTMAGAGCTGCTTTAGCCCTTTCAGAAAGTATTAGGTTCCACTACTCAGCCTGCTGCATTTAGATGTCCTTCCCATCTTGACCA 3520TTTCCTGTCCCTGGGGGCCTCTCTTCACCATGGAG6'C-A-GGACAGGGACTGGCACCTCTGGGCTGTGTTGGGTGGTGGCTGGAGTACAAGGATATGAGGGCAGCATTTGTGTGAAGGTG 364GGTGCCAAAGCCAGAGCTGGTTGCAAATACTTTCACATCTCAGTTTTTGGATTGATGGTGTCTGACCTTCTCAGCTTCGGAGCTGGTTGCAAATACTTT 37376TCATCTCAGTTTTTGGATTAATGGTGGCTGATCTTCTCAGCCTTGGGATAAGAATGTAAGGGGCTTTCAAGATAAGGAGAGGAATGGTCAGAGAGGGAAT 38828

ExonVI Giu Leu Cys Lys Lys Leu His Ala Lys Ilie AspCCTAGCGTGCTCTATTATTCAACATCCGTCTACTTTTCCTCTCTATTATTCAACATCCGTCTACTTTTCCTCCCAG/GAA CTG TGC AAA AAG CIT CAC GCC MAG ATA GAT 3991Ser Val Asp Glu Glu Arg Tyr Asp Thr Giu Val Lys Leu Gin Lys Thr Asn Lys Glu7ICA GIG GAT GAG GMA AGG TAT GAC ACA GAG GIG AAG CIA CAG AAG ACT MAC AAG GAG/GTGAGGTCAGCATGCGAGCATCTGGCTCATCCTTTGTTCCTCTC 4092

ExonVII Lou Glu 7Asp Leu Sor Gin Lys Lou Phe Asp Lou Arg GlyCATGCTGCTTGCCATGGTCTCTCICACTCATGCCCTGCTCTTTCTGCCCGCTTCTCTCIIGCCCACAG/CTG GAG GAC CIG AGC CAG AAG CIG TTT GAC CIG AGG GGC 4199Lys Phe Lys Org Pro Pro Lou Arg Arg Val Arg Met Sor Ala Asp Ala MetLou Arg Ala Lou Lou Gly Sor Lys His Lys Val Asn HotAAG TIC MAG AGG CCA CCC CTG CGC AGG GIG CGI ATG ICC GCI GAT GCC AIG CTG CGI GCC CTG CIA GGC ICC AAG CAC AMA GTC MAC AIG 4289Asp Lou Arg Ala Asn Lou Lys Gin Val Lys Lys Glu Asp Thr Glu LysGAC CII CGG GCC AAC CTG AAG CAA GTC AAG AAG GAG GAC ACA GAG AAG/GTACCACTTTCATCCCATTAAGGCATAAGCTTCCAACTTTTGGGGATGACATCTCC 4393CTGCAGCAGACATGACTTATTCAGTATGTCTTCCGCCTCTGTTTCTCCTCAAC~'C-TATGTATTGTAACTCGAGTCCGCCCGAAAACATACGCAGGGTGA 45131CTGGGGTCCCAGCTAGTTTTCAAGGAGGTGAGTGTGAGCATTGAATAAGGTCCCAAGGTGATGGGAAAAGGCACTTTTGGTGTCATACCCATGGGTTCA 46333

ExonVIII Glu Lys Asp Lou Arg Asp Val Gly Asp Trp Arg Lys Asn Ilie Glu Glu Lys Ser Gly Met Glu Gly ArgTCGTCATGCACACTTATCCCTCTGCAG/GAG AAG GAC CIC CGT GAT GIG GGT GAC TGG AGG AAG AAC All GAG GAG MAA ICC GGC AIG GAG GGC AGG 4729Lys Lys Met Phe Glu Ala Gly Glu Ser *

AAG AAG ATG TTT GAG GCT GGC GAG ICC TAA GCACTGGTCTCTCCACTCTTGCCATTTCCGCCCTCTTCCATCCCCTCCTGAGCATGGCCACAGCTGTGAGCATGGCCACC 4839TCCCTGCCCTGAACCTCAACACGTCCTCACCATGCATTGAACCCACTGACCGGGTGCTGTTTCTCTGGCTTTGTGAAGAGCTGCAGTCTGAAAGAGCAGTGTA AA T-AA GCITTCATG 4957GGAYGAGYGGGGATGTGGCCTGCTCTGGTGGGG8CTGAGGGTGCTTAGGGCTGTGGGAACACACTAAGGATAC 5027

FIG. 2. Nucleotide sequence of the quail fast skeletal muscle troponin I gene. The derived troponin I amino acid sequence is shown abovethe nucleotide sequence; the termination codon is indicated by three asterisks. The GT and AG intron splice junctions are underlined. The majortranscription start site is indicated by a horizontal arrow, and poly(A)-addition sites are marked by arrowheads. The upstream "TATA,""CCATT," and muscle gene homologous sequence and the downstream AATAAA polyadenylylation consensus sequence are boxed. Two shortgaps (22 and 23 bp) in the sequence of the first intron are indicated.

8082 Developmental Biology: Baldwin et al.

TATA

11 III IV V VI VII VillAATAAA

ATGT-5

I I IN _III m * m

1-2 2-4 5-18 19-61 62-91 92-150 151-182

2 3 4 5kb

Troponin C binding domain

M Actin binding domain

FIG. 3. The intron/exon organization of the fast troponin I gene. Exon sequences are shown as blocks and introns as lines. The positionsof the TATA homology upstream, the ATG translation initiation codon, the TAA stop codon, and the AATAAA polyadenylylation consensussequence are indicated. The amino acid codons in exons are numbered below. The positions of the troponin C- and actin-binding domains areindicated by the hatched areas.

sequences of cardiac or slow muscle troponin I isoforms.The 5' and 3' Boundaries. Upstream of the genomic

sequences homologous to the 5' nontranslated sequences ofcC106 are the sequences TTTTATA and TAAA, similar tothe TATA consensus sequence present in the upstreampromoter regions of most eukaryotic, polymerase II-depen-dent structural genes (17). Identification of this region as thetroponin I promoter was further established by nuclease S1mapping from a Bgl I site in the 5' nontranslated sequence ofboth the genomic DNA and cC106 (15). The predominant 5'mRNA terminus, accounting for -'60% of the Si-protectedfragments, is located at the G residue 30 nucleotides down-stream of the first T of the sequence TTTTATA (Fig. 2).Another 5' terminus was detected by S1 analysis =43nucleotides upstream of the major terminus and 29 nucleo-tides from a TAAA sequence, suggesting that the fast muscletroponin I gene has two transcription start sites. The pre-dominant 5' terminus detected by S1 analysis has beenconfirmed by primer-extension analysis, but the putativeupstream second terminus has not been detected by thismethod and requires further investigation.The 3' boundary ofthe troponin I gene was identified by the

presence of an AATAAA sequence, characteristic of thepolyadenylylation consensus sequence of eukaryotic poly-merase II-dependent structural genes (18), located 190 nu-cleotides downstream of the TAA translation stop codon(Fig. 2). S1 mapping analysis (15) reveals that troponin IRNAs have three termini that map 12, 15, and 43 nucleotides3' of the final A of the AATAAA sequence (Fig. 2). Thus, the5' and 3' boundaries of this fast skeletal muscle troponin Igene are within a 4500-bp region that encodes 830 bp oftroponin I mRNA sequence dispersed in eight exons. Thetroponin I mRNA encoded by this gene includes 82 bp of 5'nontranslated sequence from the predominant transcriptionstart site, 546 bp of protein-coding sequence, and =200 bp of3' nontranslated sequence.Intron/Exon Arrangement of the Troponin I Gene. Exons of

the quail fast skeletal muscle troponin I gene range in sizefrom 300 bp for exon VIII to only 7 bp for exon III, and allare bounded by GT/AG splice junctions. To our knowledge,exon III is the smallest yet reported for any gene (19). Thefirst exon encodes only nontranslated RNA sequence and isseparated by a relatively large (1700-bp) first intron separat-ing the translation start codon in the second exon (Figs. 2 and3). The first five exons and their associated introns comprise'80% of the gene sequence. This gene organization concen-

trates the DNA encoding the carboxyl-terminal half of theprotein (Fig. 3). The carboxyl-terminal 32 amino acids, thetranslation stop codon (TAA), and the entire 3' nontranslatedsequence are included in exon VIII.Troponin I proteins have two known functional domains.

The troponin C-binding domain of fast muscle troponin I islocated between amino acid residues 10 and 22 (1). Thisdomain is encoded by exons IV and V (Figs. 1 and 3) and thus

is split by an intron. Actin-binding domains and actomyosinATPase-inhibition domains of fast troponin I are located inthe region of amino acids 98-119 (1). These domains areencoded entirely within exon VII (Figs. 1 and 3).

5' Flanking DNA: Sequence Homology with Muscle a-ActinGenes. Nuclease S1 mapping identified a major transcriptionstart site for troponin I mRNA (Fig. 2). Upstream of this siteis a sequence, TTTTATA, that is similar to the TATAhomology (17) found in approximately the same location inmost RNA polymerase II-dependent genes. Two sequences,CCATT and CCAT, similar to the CCAAT homology (23),are located 100 and 80 nucleotides upstream ofthe major startsite (Fig. 2). Thus the immediate 5' flanking DNA of thetroponin I gene resembles the promoter regions of othereukaryotic genes.Troponin I gene expression is coordinately regulated with

other muscle-specific genes during myogenesis (6), and thusit is of interest to examine whether these coexpressed genesshare upstream muscle-specific sequence homologies. Acomputer search of troponin I 5' flanking DNA reveals asequence homologous to a 20-bp sequence found 100 bpupstream of the mRNA start site of both chicken and ratskeletal muscle a-actin genes (20, 24, 25). This homologoustroponin I sequence is located 329 bp upstream of the majormRNA transcription start site, matches the chicken a-actinsequence in 12 of 17 positions, and is related to sequences inthe 5' flanking regions of a skeletal muscle myosin light chain3 gene (21) and a cardiac myosin heavy chain gene (22) (Fig.4).

DISCUSSIONThe structure of a quail fast muscle troponin I gene has beendetermined unambiguously by comparing nucleotide se-quences of fast skeletal muscle troponin I cDNAs (Fig. 2)with homologous troponin I genomic nucleotide sequences.This sequence comparison, along with nuclease S1 mappingto define the 5' and 3' transcribed gene boundaries, demon-strates that the 830-bp troponin I mRNA is encoded by a4.5-kb gene comprised of eight exons (Fig. 3). The exonsequences of this quail troponin I gene are identical to thoseof fast skeletal muscle cDNAs and encode a protein identicalto the chicken fast troponin I isoform (1). The structure oftroponin I, as well as other vertebrate myosin and actinmuscle genes (21, 22, 24-27), now provides a basis forunderstanding the evolution of contractile protein gene fam-ilies and their muscle-specific regulation.The structure of the fast troponin I gene reveals a different

exon organization for the functional actin-binding andtroponin C-binding domains of the troponin I protein. Thetroponin I actin-binding domain is encoded exclusively with-in one exon, exon VII (Fig. 3), and thus exhibits anexon-domain relationship common to many proteins (28, 29).It will be of interest to determine whether the variety of genes

0

Proc. Natl. Acad. Sci. USA 82 (1985)

1

Proc. Natl. Acad. Sci. USA 82 (1985) 8083

Chick a-actin

Rat a-actin

Quail Troponin I

Chick Myosin Light Chain 3

Rat Cardiac Myosin Heavy Chain

5'-100 G C C C G A

-100 IG C C C a Aj

-329 G g a C G A

-256

-411

G C C C G g

a g a C a g

I

3,c a c c CAAA TA T

c a c c CAAA TA T

c c a g C A A A T A T

a c a a g g A A T A T

g g g a C A A A T A T

2 3

Consensus Sequence |G C C C G A ca g a a g a

9g

a c c C A A A T A Tc a a g g9 9 9

2 3

Fia. 4. Sequence homologies upstream of quail fast troponin I, chicken and rat skeletal muscle a-actin (20), chicken skeletal myosin lightchain 1 (21), and rat cardiac myosin heavy chain (22) genes. Sequences are aligned directly with the chicken actin gene homologous sequenceand are subdivided into two homologous regions (1 and 3, boxed) that flank a more variable central core sequence region (designated 2).Uppercase letters represent nucleotides identical with nucleotides in the chicken actin sequence. Lowercase letters indicate variability at thesepositions. Numbers to the left of the sequences show distance (bp) from the 5' border of homologous sequences to transcription start sites. Theconsensus sequence below shows the most prevalent nucleotide at each position in the five homologous muscle gene sequences and the variablenucleotides at each position.

encoding other actin-binding proteins have protein domainsand exons similar in structure to troponin I exon VII,consistent with an exon-shuffling model of protein evolution(30). It also will be of interest to compare exon structure ofthe actin-binding domains of the genes encoding the slowskeletal and cardiac troponin I isoforms, since the amino acidsequences of actin-binding domains of these three troponin Iisoforms are highly conserved (Fig. 1). In contrast to theactin-binding domain, the troponin C-binding domain of thefast skeletal muscle troponin I gene (1) is split between exonsIV and V. The exact borders of the troponin C-bindingdomain are uncertain and might extend into exon III (1). Thetroponin C-binding domain is located in a region of variableand conserved amino acids in the three troponin I isoforms(Fig. 1). The partial conservation of amino acid residuesamong all isoforms in the region split by exons IV and V ofthe fast troponin I gene suggests that the troponin C-bindingdomain originated prior to the duplications of the ancestralgene that gave rise to this gene family. Future comparativeanalysis of the exon organization of cardiac and slowtroponin I genes in this troponin C-binding-domain region, aswell as the other homologous protein-coding regions, shouldreveal the evolutionary history of the troponin I gene familyand the origins of the functionally specialized domains of thecardiac, fast, and slow troponin I isoforms of vertebratemuscles.

Structural comparison of cardiac, fast, and slow troponinI proteins indicates that these isoforms are evolvingnonrandomly. The amino-terminal halves of these isoformsare highly divergent, whereas their carboxyl-terminal halvesare highly homologous over long stretches of protein se-quence having no known functional or structural importance(ref. 1 and Fig. 1). The cardiac troponin I isoform also has a26 amino acid amino-terminal extension perhaps functionallyimportant in cardiac muscle function through phosphoryl-ation of a serine in this extension-peptide region (31). Theexon structure of the quail fast troponin I gene suggests thatgene conversion might maintain the homology ofthe carboxyltermini of these three isoforms and that exon-junction slidingmay have played a role in the origin of the cardiac isoformamino-terminal extension.The introns of the fast troponin I gene are distributed

unevenly along the RNA coding sequence, leading to aheterogeneous distribution of exon sizes (Fig. 3). Exon sizes

range from 7 bp to 300 bp. Only two of the eight exons (exonsV and VII) are in the 140-bp size class, the most prevalent ineukaryotic genes (32). The first five exons are flanked by themajority of intron DNA (Fig. 3). This gene organizationdisperses the sequences encoding the amino-terminal half ofthe protein over 3.5 kb ofDNA and concentrates the exonsencoding the carboxyl-terminal half of the protein into =1 kbofDNA. This organization could favorgene conversion at the3' ends of the isoform genes, thereby maintaining the strikinghomology of the carboxyl-terminal halves of the isoforms.Gene conversion is one mechanism known to act on genefamilies to maintain their sequence homogeneity (33). Geneconversion has been proposed to account for sequencehomologies in two t-globin genes (34) and in the variableregions of immunoglobulin genes (33). Such a gene-conver-sion mechanism predicts that the exon/intron organization ofthe three troponin I isoform genes is similar in their carboxyl-terminal gene regions and that sequences of intron 7 of thesegenes are more homologous than those of introns in the moredivergent 5' regions. In this regard, it also will be of interestto determine whether the troponin I isoform genes are closelylinked, an organization that might enhance such gene-conversion events.The intron/exon organization of the fast troponin I gene

also suggests an origin for the 26-residue amino-terminalpeptide in the cardiac troponin I isoforms (Fig. 1). If thecardiac troponin I gene is shown to have an intron positionedimmediately downstream ofthe ATG initiation codon, similarto intron 2 of the fast troponin I gene, then the cardiacamino-terminal extension likely arose as an insertion by asplice-junction-sliding mechanism that created an exon largerby 78 bp than the corresponding exon in the fast muscle gene.This type of mechanism has been proposed to account forinsertions in the serine protease family (35). Alternatively,the amino-terminal differences between the cardiac andskeletal muscle isoforms may have evolved by exon insertionor by exon loss.The troponin I gene shares two structural features with

evolutionarily unrelated but muscle-specific contractile-pro-tein genes. These shared features may be functionally andevolutionarily significant in directing their muscle-specificexpression. First, troponin I and other muscle genes have ahomologous sequence upstream of their promoters. This17-bp sequence is located 100 bp upstream of the transcrip-

Developmental Biology: Baldwin et A

8084 Developmental Biology: Baldwin et al.

tion start site in both chicken and rat a-actin genes (20) andat -329 in the fast troponin I gene (Figs. 2 and 4). Thesesequences align into three regions: a G+C-rich region and anA+T-rich region that flank a 4-bp central core region ofmorevariable sequence (Fig. 4). Single-base shifts and inversionsin these alignments increase the extent of homology of thesesequences. Homologous sequences are also located at -411in the 5' upstream region of the cardiac myosin heavy chaingene (22) and at -256 of the myosin light chain 3 gene (21).The identification of homology between cardiac and skeletalmuscle gene promoters is not unexpected, since cardiac andskeletal genes are coexpressed in embryonic skeletal muscle(36-38). A homologous sequence has not yet been identifiedin the flanking regions of myosin light chain 1 and myosinlight chain 2 gene promoters (21, 26), but data on the 5'flanking sequences of these genes are limited. Based on thedistant location ofthis sequence from the troponin I gene, thissequence could be further upstream than the publishedsequences of these genes. Extensive computer analysis ofmammalian library gene sequences revealed that these ho-mologies in the 5' flanking regions of muscle genes aremuscle-specific and are not found in nonmuscle gene se-quences. Furthermore, recent gene-transfection studiesshow that the region of the troponin I homologous sequenceat -329 is required for the muscle-specific expression oftroponin I genes (15) and that muscle-specific regulatorysequences also are localized in the skeletal a-actin genepromoter region (39). We suggest that these homologoussequences are cis-acting transcriptional control elements (seeref. 40 for review) involved in the coordinate control ofmuscle genes, analogous to consensus promoter sequencesinvolved in the regulation of heat shock (41) and steroid-induced genes (42).The second structural feature that the troponin I gene

shares with a-actin (24, 25, 27), cardiac myosin heavy chain(22), myosin light chain 2 (26), and myosin light chain 3 genes(21) is that a large intron separates the promoter and first exonencoding nontranslated RNA sequences from the protein-coding region in the second exon. This gene organizationcould facilitate recombinational genome "shuffling" of mus-cle gene promoters and their transcriptional regulatory ele-ments and raises the possibility that the first intron of musclegenes has a regulatory function.

We thank Irene Althaus and Maggie Ober for their expert assist-ance with the DNA sequence and computer analysis of the troponinI gene and Dr. William Pearson for generously providing hissequence homology computer programs and his expertise for anal-ysis of the troponin I promoter sequence homology. We thank Dr.Stephen Konieczny for generously providing his data on the 5' and3' boundaries of the troponin I gene and for extensive discussion andthank Gladys Bryant for preparing this manuscript. Data handlingand analysis were made possible in part by the use of hardware fromthe University of Virginia Clinical Research Center, Grant MO1 RR00847. This investigation was supported by a research grant from theNational Institutes of Health.

1. Wilkinson, J. M. & Grand, R. J. A. (1978) Nature (London)271, 31-35.

2. Potter, J. D. & Gergely, J. (1974) Biochemistry 13, 2697-2703.3. Head, J. F. & Perry, S. V. (1974) Biochem. J. 137, 145-154.4. Wilkinson, J. M., Perry, S. V., Cole, H. A. & Trayer, I. P.

(1972) Biochem. J. 127, 215-228.

5. Horwitz, J., Bullard, B. & Mercola, D. (1979) J. Biol. Chem.254, 350-355.

6. Devlin, R. B. & Emerson, C. P., Jr. (1978) Cell 13, 599-611.7. Hastings, K. E. M. & Emerson, C. P., Jr. (1982) Proc. Nati.

Acad. Sci. USA 79, 1153-1157.8. Rigby, P. W. J., Dieckmann, M., Rhodes, C. & Berg, P. (1977)

J. Mol. Biol. 113, 237-251.9. Lawn, R. M., Fritsch, E. F., Parker, R. C., Blake, G. &

Maniatis, T. (1978) Cell 15, 1157-1174.10. Benton, W. D. & Davis, R. W. (1977) Science 196, 180-182.11. Maniatis, T., Hardison, R. C., Lacy, E., Lauer, J., O'Connell,

C., Quon, D., Sim, G. K. & Efstratiadis, A. (1978) Cell 15,687-701.

12. Southern, E. M. (1975) J. Mol. Biol. 98, 503-517.13. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65,

499-560.14. Berk, A. J. & Sharp, P. A. (1977) Cell 12, 721-732.15. Konieczny, S. F. & Emerson, C. P., Jr. (1985) Mol. Cell. Biol.

5, 2423-2432.16. Sharp, P. A. (1981) Cell 23, 643-646.17. Goldberg, M. (1979) Dissertation (Stanford Univ., Stanford,

CA).18. Proudfoot, N. J. & Brownlee, G. G. (1976) Nature (London)

263, 211-214.19. Tate, V. E., Finer, M. H., Boedtker, H. & Doty, P. (1983)

Nucleic Acids Res. 11, 91-104.20. Ordahl, C. P. & Cooper, T. A. (1983) Nature (London) 303,

348-349.21. Nabeshima, Y., Fujii-Kuriyama, Y., Muramatsu, M. & Ogata,

K. (1984) Nature (London) 308, 333-338.22. Mahdavi, V., Chambers, A. P. & Nadal-Ginard, B. (1984)

Proc. Natl. Acad. Sci. USA 81, 2626-2630.23. Benoist, C., O'Hare, K., Breathnach, R. & Chambon, P.

(1980) Nucleic Acids Res. 8, 127-142.24. Chang, K. S., Rothblum, K. M. & Schwartz, R. J. (1985)

Nucleic Acids Res. 13, 1223-1237.25. Zakut, R., Shani, M., Givol, D., Newman, S., Yaffe, D. &

Nudel, U. (1982) Nature (London) 298, 857-859.26. Nudel, U., Calvo, J. M., Shani, M. & Levy, Z. (1984) Nucleic

Acids Res. 12, 7175-7186.27. Fornwald, J. A., Kuncio, G., Peng, I. & Ordahl, C. P. (1982)

Nucleic Acids Res. 10, 3861-3876.28. Steinmetz, M., Moore, K. W., Frelinger, J. G., Sher, B. T.,

Shen, F.-W., Boyse, E. A. & Hood, L. (1981) Cell 25,683-692.

29. Tung, A., Sippel, A. E. & Schutz, G. (1980) Proc. Natl. Acad.Sci. USA 77, 5759-5763.

30. Gilbert, W. (1978) Nature (London) 271, 501.31. Solaro, R. J., Moir, A. J. G. & Perry, S. V. (1976) Nature

(London) 262, 615.32. Naora, H. & Deacon, N. J. (1982) Proc. Natl. Acad. Scs. USA

79, 6196-6200.33. Baltimore, D. (1981) Cell 24, 592-594.34. Slightom, J. L., Blechl, A. E. & Smithies, 0. (1980) Cell 21,

627-638.35. Craik, C. S., Rutter, W. J. & Fletterick, R. (1983) Science 220,

1125-1129.36. Minty, A. J., Alonso, S., Caravatti, M. & Buckingham, M. E.

(1982) Cell 30, 185-192.37. Toyota, N. & Shimada, Y. (1983) Cell 33, 297-304.38. Hallauer, P. L. & Emerson, C. P., Jr. (1985) J. Cell. Biochem.

9, 65.39. Melloul, D., Aloni, B., Calvo, J., Yaffe, D. & Nudel, U. (1984)

EMBO J. 3, 983-990.40. Davidson, E. H., Jacobs, H. T. & Britten, R. J. (1983) Nature

(London) 301, 468-470.41. Pelham, H. R. B. (1982) Cell 30, 517-528.42. Grez, M., Land, H., Giesecke, K., Schutz, G., Jung, A. &

Sippel, A. E., (1981) Cell 25, 743-752.

Proc. Natl. Acad Sci. USA 82 (1985)