molecular evolution at the decupentupkgic locus in drosophilafigure 1.-genetic and physical map of...

13
Copyright 8 1997 by the Genetics Society of America Molecular Evolution at the decupentupkgic Locus in Drosophila Stuart J. Newfeld, Richard W. Padgett,' Seth D. Fmdley,* Brent G. Richter,* Michele Sani~ola,~ Margaret de Cuevas? and Wfiam M. Gelbart *The Biological Laboratories and * The Museum of Comparative Zoology Laboratories, Haruard University, Cambridge, Massachusetts 02138 Manuscript received July 31, 1996 Accepted for publication October 23, 1996 ABSTRACT Using an elaborate set of cisregulatory sequences, the decapentapkgic (dpp) gene displays a dynamic pattern of gene expression during development. The Gterminal portion of the DPP protein is processed to generate a secreted signaling molecule belonging to the transforming growth factor-@ (TGF-p) family. This signal, the DPP ligand, is able to influence the developmental fates of responsive cells in a concentra- tiondependent fashion. Here we examine the sequence level organization of a significant portion of the dpp locus in Drosophila mlanogasterand use interspecific comparisons with D. simulans, D. pseudoobscura and D.virilis to explore the molecular evolution of the gene. Our interspecific analysis identified signifi- cant selective constraint on both the nucleotide and amino acid sequences. As expected, interspecific comparison of protein coding sequences shows that the Gterminal ligand region is highly conserved. However, the central portion of the protein is also conserved, while the N-terminal third is quite variable. Comparison of noncoding regions reveals significant stretches of nucleotide identity in the 3' untranslated portion of exon 3 and in the intron between exons 2 and 3. An examination ofcDNA sequences representing five classes of dpp transcripts indicates that these transcripts encode the same polypeptide. I NTERCELLULAR communication during develop ment is essential for the generation of the pattern of organs and tissues that characterize each species. In Drosophila mlanoptq the decapt?ntuplegic ( d p l ) gene en- codes a key signaling protein (PADGETT et al. 1987) that is necessary for a variety of developmental decisions. These include the determination of dorsal ectoderm in the early embryo (IRISH and GELBART 1987), larval gut differentia- tion (IMMERGL~CK et al. 1990; PANGANIBAN et aL 1990), the formation of adult wings (POSAKONY et aL 1991) and oogenesis (T~OMBLY et aL 1996).By genetic and molecu- lar analyses,severalof the patterning functions of dpp were separated into three major genetic domains (shv, Hin and disk; SEW and GELBART 1985). Molecular stud- ies suggest that dpl encodes a single polypeptide, belong- ing to the transforming growth hctor;O (TGF-P) family, that is expressed in a highly localized pattern by large arrays of &regulatory sequences (PADGETT et al. 1987; ST. JOHNSTON et al. 1990). Corresponding author; William M. Gelbart, The Biological Labora- tories, Harvard University, 16 Divinity Ave., Cambridge, MA 02138. E-mail: [email protected] ' A-esent address: Waksman Institute, Department of Molecular Biol- ogy and Biochemistry, The Cancer Institute of New Jersey, Rutgers University, Piscataway, NJ 08855. * Pnw7zt addre55: Department of Biochemistry, University of Wash- ington, Seattle, WA 98185. 'Resent address; Department of Viruses and Growth Control, Bio- gen, Inc., Cambridge, MA 02142. Present addm55: Department of Embryology, Carnegie Institution of Washington, Baltimore, MD 21210. Genetics 145: 297-309 (February, 1997) All members of the TGF-P family of signaling mole- cules share common structural features. Precursor poly- peptides are proteolytically cleaved to generate an N- terminal fragment (the proregion) thought to be in- volved in dimerization and secretion of the precursor and a Cterminal fragment that forms the biologically active ligand (reviewed in MASSAGUB 1990).The se- creted ligand acts through receptor kinases on the sur- face of responsivecells(reviewed in MASSAGUI? et al. 1994). The roughly 100 amino acid ligand contains the six/seven cysteinesthat arestrictly conserved in all fam- ily members. Based upon additional sequences in the ligand region, the family has been divided into several subfamilies revealing extensive evolutionary conserva- tion. The largest subfamily, referred to as the DPP/ BMP subfamily, includes the D. melanogaster genes dpp, 60A (WHARTON et al. 1991) and screw (ARORA et al. 1994) and, at present, 12 vertebrate genes including the bone morphogenetic proteins (BMPs; reviewed in KINGSLEY 1994). Within this subfamily, the most similar molecules are dpp and BMP2 and BMP4. The ability of human BMP2 and BMP4 to rescue dpp null mutations (PAD- GE'M et al. 1993) and the demonstration that DPP can induce bone formation in mammalian tissue culture assays (SAMPATH et al. 1993) led to the proposal that dpP is homologous to bothof these human genes. Com- parison of dpp coding sequences from two distantly re- lated insects (D. melanogaster and the grasshopper Schis- tocerca amen'cana) with human BMP4 reveals that the

Upload: others

Post on 24-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

Copyright 8 1997 by the Genetics Society of America

Molecular Evolution at the decupentupkgic Locus in Drosophila

Stuart J. Newfeld, Richard W. Padgett,' Seth D. Fmdley,* Brent G. Richter,* Michele Sani~ola,~ Margaret de Cuevas? and Wfiam M. Gelbart

*The Biological Laboratories and * The Museum of Comparative Zoology Laboratories, Haruard University, Cambridge, Massachusetts 02138

Manuscript received July 31, 1996 Accepted for publication October 23, 1996

ABSTRACT Using an elaborate set of cisregulatory sequences, the decapentapkgic (dpp) gene displays a dynamic

pattern of gene expression during development. The Gterminal portion of the DPP protein is processed to generate a secreted signaling molecule belonging to the transforming growth factor-@ (TGF-p) family. This signal, the DPP ligand, is able to influence the developmental fates of responsive cells in a concentra- tiondependent fashion. Here we examine the sequence level organization of a significant portion of the dpp locus in Drosophila mlanogasterand use interspecific comparisons with D. simulans, D. pseudoobscura and D.virilis to explore the molecular evolution of the gene. Our interspecific analysis identified signifi- cant selective constraint on both the nucleotide and amino acid sequences. As expected, interspecific comparison of protein coding sequences shows that the Gterminal ligand region is highly conserved. However, the central portion of the protein is also conserved, while the N-terminal third is quite variable. Comparison of noncoding regions reveals significant stretches of nucleotide identity in the 3' untranslated portion of exon 3 and in the intron between exons 2 and 3. An examination of cDNA sequences representing five classes of dpp transcripts indicates that these transcripts encode the same polypeptide.

I NTERCELLULAR communication during develop ment is essential for the generation of the pattern

of organs and tissues that characterize each species. In Drosophila m l a n o p t q the decapt?ntuplegic (dpl ) gene en- codes a key signaling protein (PADGETT et al. 1987) that is necessary for a variety of developmental decisions. These include the determination of dorsal ectoderm in the early embryo (IRISH and GELBART 1987), larval gut differentia- tion (IMMERGL~CK et al. 1990; PANGANIBAN et aL 1990), the formation of adult wings (POSAKONY et aL 1991) and oogenesis ( T ~ O M B L Y et aL 1996). By genetic and molecu- lar analyses, several of the patterning functions of dpp were separated into three major genetic domains (shv, Hin and disk; SEW and GELBART 1985). Molecular stud- ies suggest that dpl encodes a single polypeptide, belong- ing to the transforming growth hctor;O (TGF-P) family, that is expressed in a highly localized pattern by large arrays of &regulatory sequences (PADGETT et al. 1987; ST. JOHNSTON et al. 1990).

Corresponding author; William M. Gelbart, The Biological Labora- tories, Harvard University, 16 Divinity Ave., Cambridge, MA 02138. E-mail: [email protected] ' A-esent address: Waksman Institute, Department of Molecular Biol-

ogy and Biochemistry, The Cancer Institute of New Jersey, Rutgers University, Piscataway, NJ 08855.

* Pnw7zt addre55: Department of Biochemistry, University of Wash- ington, Seattle, WA 98185.

'Resen t address; Department of Viruses and Growth Control, Bio- gen, Inc., Cambridge, MA 02142.

Present addm55: Department of Embryology, Carnegie Institution of Washington, Baltimore, MD 21210.

Genetics 145: 297-309 (February, 1997)

All members of the TGF-P family of signaling mole- cules share common structural features. Precursor poly- peptides are proteolytically cleaved to generate an N- terminal fragment (the proregion) thought to be in- volved in dimerization and secretion of the precursor and a Cterminal fragment that forms the biologically active ligand (reviewed in MASSAGUB 1990). The se- creted ligand acts through receptor kinases on the sur- face of responsive cells (reviewed in MASSAGUI? et al. 1994). The roughly 100 amino acid ligand contains the six/seven cysteines that are strictly conserved in all fam- ily members. Based upon additional sequences in the ligand region, the family has been divided into several subfamilies revealing extensive evolutionary conserva- tion. The largest subfamily, referred to as the DPP/ BMP subfamily, includes the D. melanogaster genes dpp, 60A (WHARTON et al. 1991) and screw (ARORA et al. 1994) and, at present, 12 vertebrate genes including the bone morphogenetic proteins (BMPs; reviewed in KINGSLEY 1994). Within this subfamily, the most similar molecules are dpp and BMP2 and BMP4. The ability of human BMP2 and BMP4 to rescue dpp null mutations (PAD- GE'M et al. 1993) and the demonstration that DPP can induce bone formation in mammalian tissue culture assays (SAMPATH et al. 1993) led to the proposal that dpP is homologous to both of these human genes. Com- parison of dpp coding sequences from two distantly re- lated insects (D. melanogaster and the grasshopper Schis- tocerca amen'cana) with human BMP4 reveals that the

Page 2: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

298 S. J. Newfeld et al.

ligand region displays significant conservation (>75% amino acid identity between each species; NEWFELD and GELBART 1995). However, the nature of the selective constraints on dpp that produce such functional and sequence conservation are unknown.

One proven method for identifjmg selective forces acting on a single gene is to examine the level of DNA sequence variation within a species and between closely related species. The ability to distinguish functionally distinct nucleotides (e.g., transcribed us. nontran- scribed and synonymous us. nonsynonymous positions in the coding region) allows the detection of selective constraints on many aspects of gene function. For exam- ple, highly conserved sequences detected by interspe- cific comparison of genomic fragments from the shv domain of dpp proved to be important regulatory ele- ments involved in midgut differentiation (WAK et al. 1995). Interspecific comparison of the rates of synony- mous and nonsynonymous substitution between repeti- tive and unique regions of the Drosophila neurogenic locus mastermind revealed that each region was under distinct levels of selective constraint (NEWFELD et al. 1993, 1994). Alternatively, intraspecific studies of nucle- otide polymorphism at the Adh (KREITMAN 1983) and Gpdh (WELLS 1996) loci of Drosophila were able to iden- tify effects of purifying selection and genetic drift.

In this paper and the accompanying paper by RICH-

TER et al. (1997) we report interspecific and intraspecific sequence analyses of dpp designed to further our under- standing of dpp regulation and molecular evolution. For the interspecific comparison, we chose to examine the Hin region, a block of DNA that contains one of the five classes of dpp transcript and all regulatory sequences necessary for normal dorsal-ventral patterning of the embryo (HOFFMANN and GOODMAN 1987). A portion of the Hin region that includes the two protein coding exons and the large intron separating them was se- quenced from genomic DNA of D. melanogaster, D. pseu- doobscura and D. virilis. The nucleotide sequence of this region from D. simulans, obtained from RICHTER et al. (1997), is included in the analysis. Molecular studies of Adh indicate that D. melanogaster and D. virilis diverged -40 mya, D. melanogaster and D. pseudoobscura diverged -25 mya and that D. simulans has very recently diverged (-2 mya) from D. melunogaster (RUSSO et al. 1995).

Comparison of dpp protein coding sequences reveals that the N-terminal part of the proregion is much more variable than the central part of the proregion or the Cterminal ligand region. Comparison of nontranslated and nontranscribed sequences identified significant stretches of nucleotide identity in the 3' untranslated portion of exon 3 and the intron. In addition, extended sequence analysis of D. melunogaster genomic DNA and cDNAs representing the five classes of dpp transcripts confirms the proposal that dpp encodes only a single polypeptide.

MATERIALS AND METHODS

Cloning and characterization of dpp genomic DNA and dpp cDNAs: Genomic libraries used in this study are: D. melanogus- ter isogenic dp cn bw in A EMBL3 (THUMMEL 1993) and A Dash I1 (FINELLI et ul. 1994), D. vin'lis and D. pseudoobscura (BLACKMAN and MESELSON 1986). Libraries were screened for dpp sequences as described in WHARTON et al. (1996). Chromosome walking, restriction mapping, subcloning and dideoxy sequencing were performed according to standard methods (SAMBROOK et ul. 1989). The D. melanogaster cDNA libraries and the dpp genomic fragments used in library screening to identify specific classes of dpp cDNAs are de- scribed in ST. JOHNSTON et al. (1990).

DNA sequence analyses: Nucleotide sequences were com- piled using the University of Wisconsin's Genetics Computing Group programs (DEVEREUX et al. 1984). Alignments of nucle- otide and inferred amino acid sequences were completed us- ing the programs of PUSTELL and KAFATOS (1984). Codon usage was calculated with SYNSUB (LEWONTIN 1989) and the program of LI et al. (1985) was used to calculate the number of synonymous ( K J and nonsynonymous (&) substitutions per site.

GenBank accession numbers: Accession numbers for dpp genomic sequences reported in this paper are as follows: U63850, D. pseudoobscura disk region homologous to D. melano- gaster 98EcoRI; U63851, D. melanogaster disk region 98EcoRI; U63852, D. melunogaster disk region 109HindIII; U63853, D. melanogaster exon 1B region; U63854, D. simulans (isogenic net dp #1) Hin region; U63855, D. vin'lis Hin region; U63856, D. pseudoobscura Hin region; U63857, D. melanogaster shv and Hin region.

RESULTS

The dpP locus encodes a single protein: The genetic complexity, large physical size and extended transcrip- tional activity associated with the dpp locus suggested that a detailed analysis of dpp genomic and DNA se- quences would prove valuable. Figure 1A shows the physical and genetic map of dpp and the location of the five classes of mRNAs produced during development (ST. JOHNSTON et al. 1990). Initial characterization of 32 cDNA clones, including at least one representative of each transcript class, determined that all contained three exons of which the last two were in common. The common exons encode the TGF-P family protein that corresponds to DPP (PADGETT et al. 1987; ST. JOHNSTON et ul. 1990). Subsequently, the unique 5' regions of rep- resentative cDNAs from each transcript class were se- quenced to ascertain differences between the classes. Figure 2A shows the location of 5' sequences from each cDNA aligned to genomic sequence from the d@ locus (described below). The start sites for transcripts A and C were confirmed by S1 nuclease mapping (ST. JOHN- STON et al. 1990). The start of transcript B was mapped to genomic sequence from the exon 1B region by 5' RACE using cDNA KP1 (R. BLACKMAN, personal com- munication). None of the initiation sites appear to be associated with TATA boxes. However, the cDNA's cor- responding to transcript D and the Rare transcript class may not be full length.

The splice junction between the alternate exon 1 and

Page 3: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

dpp Molecular Evolution 299

A I shv Hin disk 3'

Exonl B tRNAnR

WII~ 3718bp Exonl B R a r e l D l C I A 2 3 U D. melanogaster (dp cn bw)

n 4238bp 1A 2 01 &l D. simulans (net dp#1)

I -1 7987bp n 1s3+bp

fi 'ii & D. pseudoobscura - 6076bp

1 m D. virilis

C D. melanogaster

D. slmulans -

25 Mya

40 My8 D. pseudoobscura

I D. virilis

FIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular coordinates from the 22E-F chromosome walk. The shaded rectangles above the coordinate line designate the approximate locations of three genetically defined regions of the locus. The position of exons and their splicing patterns are shown below the coordinate line with filled rectangles depicting rotein coding sequences and open rectangles untranslated regions. The position and direction of transcription for two tRNAyF genes within the dpp locus are indicated by arrows (ST. JOHNSTON et al. 1990). (B) A schematic map showing the relative position of genomic DNA sequenced from each species. The location and the number of nucleotides sequenced as well as the dpp exons and the tRNATF genes contained within Hin region sequences from each species are shown. All D. melanogaster sequences were obtained from a strain isogenic for a dp cn bw chromosome. The D. melanogasterdisk sequences correspond to restriction fragments 98EcoRI (BS 1.1; BLACKMAN et al. 1991) and 109HzndIII (part of BS 3.0; BLACKMAN et al. 1991) from the 22E-F chromosome walk. The total D. melanogaster sequence is 24,631 bp. The D. pseudoobscura sequence totals 9918 bp and the disk region sequences are homologous to the D. melanogaster 98EcoRI sequences. The intron sizes for non-D. melanogaster species are not to scale. For D. melanogaster only, sequenced cDNAs (see Figure 2 for designations) representing all five transcript classes are depicted. (C) The ancestral relationship between the Drosophila species studied. The divergence times are taken from RUSSO et al. (1995). Mya, million years ago.

common exon 2 of each transcript class is shown in Figure 2B. Exon 1 of the B, A and Rare transcript classes are completely distinct. Exon 1 of transcript D begins 240 bp upstream of transcript C and terminates at the same splice donor site as exon 1C. Each cDNA splices to the same splice acceptor site at the 5' end of exon 2.

Fourteen bases 3' to the splice acceptor site is the common ATG that initiates translation of the DPP open reading frame. Upstream of the common ATG, each transcript class has an in-frame stop codon that defines the 5' end of the open reading frame except for tran- script B. Transcript B has one additional ATG codon,

in exon lB, upstream of the common ATG in exon 2. If used, the ATG in exon 1B would generate a proline- rich region of 36 additional amino acids at the N-termi- nus of the DPP polypeptide. This would place the secre- tion signal sequence inside the protein rather than at the N-terminus. The sequence surrounding the ATG in exon 1B is a poor match to the consensus for translation initiation in Drosophila (CAVENER and RAY 1991). Data- base searches using BLASTX (GISH and STATES 1993), which compares a translated nucleotide sequence against the protein sequence databases, identified no open reading frames similar to the 36 amino acid exten- sion potentially encoded from the ATG in Exon 1B.

Page 4: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

300 S. J. Newfeld et al.

A

B

5'EndTranscrlpt A (cDNAs KE26 & BEhl)

5'EndTranscript C (cDNA KE3)

12260 12270 I I

TQCCOAAOAAMAMOCMCOTTCACYPCOC 8770 8780

"GGCGlTAGTC-CA I I

8530 8540 - 5'EndTranscript D (cDNA BEsl)

C'End RareTranscripts (cDNA BEsl l ) T A - C D G

5'EndTranscript B (cDNA H1)

I ATCCGAGTCCGAWUGCOGAQATG

I

6580 6590 - I I

(exon l b contig) 670 680 I G C C ~ G G A C "

I

alternate splice common exon 1 I exon2 ORF cDNAs

transcript class

~ - .CAC-CCA-TGG-CCG-CCT-GAG-~ I T " C " a c O - A C C ~ Q C G . . . KE26, BEhl A

12444 1 12593 . . .GAA-ACl"TTA-T4l;QAC-~-GAG 9952 ... AFT-CTC-ADD-AGT-GTC-AGA-Ts 7863

C"AGG-ATT-~-GQl-lTl"TGG-AGC-GTA- AcT-Mo-c(K3-coT-DcA-~-~-DcA-cCA-QAo

(exon 1 b contig) 1365

C

TTG-CAA-OCO-ACCBGCG.. . KB3, BE81 C P D

TTG-CA&-GCG-ACC~aCO. . . BEsll Rare

exon 2 ORF exon 3 ORF frame 1 frame 2

B

Moo 4x0 woo 8mo 1CXXXJ 1 M o o 14x0 1-

RGURE 2.-Sequence analyses of D. melanogaster dpp transcribed regions. (A) The genomic sequence flanking the first nucleotide of the longest cDNA from each transcript class is shown. (B) The genomic sequence flanking the splice junction between the alternate exon 1 and the common exon 2 for each transcript class is shown. Note that all transcript classes splice to the exact nucleotide of exon 2. With the exception of transcript B, an in-frame stop codon precedes the initiator methionine in the common open reading frame in each transcript class. Transcript B could encode an additional 36 amino acid polypeptide if the first in-frame methionine was used. However, we believe this is unlikely (see text). (C) The location of all methionine and stop codons in the three fonvard reading frames (as determined by the direction of dpp transcription) in the Hin and shv region contig are shown. Within each fiame, short lines represent methionines and long lines represent stop codons. The DPP common open reading frame in exon 2 (nucleotides 12607- 13474) and exon 3 (nucleotides 15192-16090) is shown. No other significant open reading fiames are detected.

Genetic evidence from the Tegula mutation, a chromo- somal inversion that removes exon 1B without any de- tectable loss-of-function d@ mutant phenotype (R. RAY

and W. GELBART, unpublished observations), suggests that this transcript and its extended open reading frame are dispensable. Overall, it appears unlikely that the ATG of exon 1B is used and that translation of all dpp transcripts initiates at the common ATG. Downstream of the stop codon, three clustered poly-A addition sites are utilized by transcripts of all classes.

To further characterize the transcribed region of dpp, 17,673 bp of genomic DNA were sequenced. The analy- sis included contiguous sequence from the distal por- tion of the Hin region and the proximal portion of the shv region (Figure IB). A transformation construct containing genomic DNA corresponding to the se- quenced region is sufficient to rescue all Hin and shv functions (HURSH et al. 1993). An additional 1508 bp was sequenced from the region surrounding exon 1B (R. BLACKMAN, personal communication). The geno- mic sequence precisely positions each exon relative to each other ensuring that the 5' ends of the cDNAs are

intact, as well as eliminating the possibility of microex- ons and introns. From the genomic sequence we learned that the most proximal poly-A addition site is located 300 bp away from the 3' end of the distal tRNATy' embedded in the dpp locus.

The sequenced region was carefully examined to identify any additional open reading frames (ORFs) that might contribute to dpp function (Figure 2C). No significant ORFs were found other than the two com- mon ORFs contained in the dpp cDNAs. Searching the sequence databases with all D. melanogaster d@ se- quences using BLASTN (ALTSCHUL et al. 1990), which compares a nucleotide sequence against the nucleotide sequence databases, and BLASTX (GISH and STATES 1993) identified TGF-P family members and tRNATyr sequences only. The proposal that the dpi, locus en- codes a single protein of the TGF-P family is clearly supported by the sequence analysis.

Molecular evolution of the dpp protein: To examine the molecular evolution of dpp, most of the Hin region of the dpp locus was sequenced from three additional Drosophila species-D. simulans, D. pseudoobscura and

Page 5: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

dpp Molecular Evolution 301

D. viriZis. By aligning the sequences from all four spe- cies, important regulatory elements and regional varia- tion in selective constraints on the DPP protein can be identified. The amount of genomic sequence obtained from each species and the features of the dpp locus included in the sequenced regions are shown in Figure 1B. Interestingly, the tRNATy' closest to the dpp poly-A addition site is transcribed in the same direction in D. melanogaster and D. pseudoobscura, but in the opposite direction in D. virilis. The ancestral relationship and the estimated time since divergence between the spe- cies, based upon their Adh sequences (RUSSO et al. 1995) is shown in Figure IC.

The predicted size of the DPP protein differs in each species: D. melanogasta, 588 amino acids (PADGEIT et al. 1987) ; D. simulans, 593 amino acids; D. pseudoobscura, 621 amino acids; and D. virilis, 614 amino acids. The alignment of the inferred polypeptides is shown in Fig- ure 3. The levels of amino acid conservation obtained from analyses of pairwise comparisons of DPP se- quences (Table 1) are consistent with the ancestral rela- tionships of the species. It is clear that very few amino acid changes have occurred in the ligand region. This region displays >90% amino acid similarity in all pair- wise comparisons between the species. Most of the amino acid substitutions have occurred in the prore- gion, yet this region is still well conserved with 267% amino acid similarity in all pairwise comparisons be- tween the species.

However, the level of similarity for the proregion is lower than for other genes compared among these Dro- sophila species. Most D. melanogaster vs. D. virilis gene comparisons show 80-90% amino acid similarity (e.g., Z H o U and BOULIANNE 1994; POOLE 1995). A schematic of the amino acid alignment (Figure 4) shows that the distribution of amino acid substitutions in the prore- gioa is not uniform. With the exception of the N-termi- nal signal sequence, the vast majority of amino acid differences between the species are concentrated in the first third of the protein (Domain 1). This region shows a high level of both amino acid substitution and amino acid length variation. Domain 3, which falls between the propolypeptide cleavage site and the TGF-0 diagnostic ligand region, also shows a high level of amino acid substitution and length variability. Domain 3 is outside the structural core of the ligand, as shown by crystallo- graphic studies of TGF-02 (DAOPIN et al. 1992; SCHLU- NEGGER and GRUITER 1992) and thus may not be in- volved in interaction with receptors.

Within the conserved central segment of the prore- gion (Domain 2) there is just a single site of amino acid length variation. In fact, the level of conservation in Domain 2 is similar to the level seen in the ligand re- gion, suggesting an important role for this portion of the protein in d# function. For the related protein TGF-01, there is biochemical evidence that the prore- gion is essential for secretion and dimerization (re-

me1 s i m Pse V I 2

56 58 56 50

85 87

107 101

142 145 162 161

191 196 215 221

251 2 56 275 281

311 316 335 336

371 376 394 396

431 436 454 456

481 486 514 507

541 546 574 567

I MRAWLLLLAVLATFQTIVRVASTED--ISQRFIAAIAPVAAHIPLASASGSGSG--RSGS @Lon 2 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E . . . . . . . . . . . . PG.... .... I .......... P..Q......TS...........TRTEPSA...AAAAA---- TAT

.................. Q T........TT.T.IINSNIT-------- K

"

RSVGASTSTAWULFNPFSEPAsF ---------- ~ D ~ D - - . . . . - - - - - - - - - - - - - - - K rrr IV

.............. . . . . ATAT.TAT N.LLYKS--------- SDNNNNNlKNRNNNNNNLNKGP ..G....... """"" """"""""""_ TTKTTT.T . . . . . . . K..N.LLLSNNNNNNNNYS....ItlNKKNNKNK---------SKH

.......... . . . . V

SHRSKTNKKPSKSDANRQFNEVHKPRTCQLENSKNKSKQLVNKP---NHNKMAVKEQRSH

RNNKNKGN.H P T.. D.K H. .KSAPSKQFNnrtIKTRTDQL.NS.NSKNKHNKQHKHKQLTNHNTTTPATTTTTMAKKELQ

H~----KKSHHHRSHQPKQASASTESHQSSSIESIFVEEPTIN------VP VI VI I

........................... . . . . . . . . . . . . . . . . . . . . . E...... NHN.. .......................... ....

.............................................. """

QPQQQQQQQQQ .HKPATTT.L . . . . . . . . P- . . T...DD.A.A.EE......------.. QQKQQQQQTAIANV€ALSP..KPVP.FST..SS.SSSSSSYIPDSSNT~DSVGEFEM.

. .

AN~IIAEQGPSTYSKEALIKDKLKPDPSTLVEIEKSLLSLLSL~KRPPKIDRSKIIIPE

.G E..E......KE.................N....................... ............................................................ . . . . G1.D DE.A...SKEI....V... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

P M K K L Y A E I M G H E L D S V N I P K P G L L T K S A N T V R S F T H I ( S

A . . . . . . . . . . . . . . . . . . . R.................. .................... A . . . . . . ..................................................... IPADEKLKAAELQLTRDALSQQWASRSSANRTRYQVLWDIGQREPSYLLLDT .................... P ....................................... ... E . . . . . . . . . . . . . . . . . . E . . . . . . . . . . . . E...---"VSQTVY........ . . . . . . . . . H . . . . . . . . . .

Lexon 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

VI11

KTVRLNSTDTVSLDVQPAVDRWLASPQRNYGLLVEVRTVRSLKPAPHHHVRLRRSADEAH

........................ T . . K . . . . . . . . . . " . . . . . . . . . . . . . . . . . . . .

. . . . . . . . E ............... T..K........... ....................

............................................................

me1 cleavage site I E R W Q W K Q P L L F T Y T D D G R K K S I R D V - - - - - - - - - - S G G Q P R R P T R R K

IX

...........................

.Q A...............SGGGGGGGGAGE.GK.NG....R.HQ...A...

.D . . . . . . . . . . . . . . . . . . S......---------SGREG..NG..GRN.RHO.RS... ...................... """""

... . . . . . .

NHDDTCRRXSLYVDFSDVGWDDWIVAPLGYDAYYCHGKCPFPLADHFNSTNHAWQTLVN

..EE . . . . . . . . . . . A...........P................................

.NE. N . . . . . . . . . . Q....S......P...........Q......L............. N M N P G K V P K A C C V P T Q L D S V A " T W G C G C R 588

.L 621 593

.L ............... EGIS . . . . . . . .................... 614

1 ligand

............................................................

.............................................. .............................................

FIGURE 3.-Alignment of DPP amino acid sequences. The amino acid sequences (in one-letter code) deduced from the nucleotide sequence of a D. melanogaster dpp cDNA (PADGETT et al. 1987), D. simulans net dp #1 (RICHTER et al. 1997), D. virilis and D. pseudoobscura dpp genomic DNA are aligned. The amino acids are numbered consecutively for each species and indicated at the left margin. Exon numbers are indicated above the D. mekmogaster sequence. Roman numerals above the D. melanogaster sequence identify insertion/deietion events described in Table 2. Dashed lines indicate gaps re- quired in the alignment to demonstrate maximum identity between the sequences. A period in the alignment indicates an identical amino acid in that species and D. rnelanogaster. The experimentally determined propolypeptide cleavage site for D. melanogaster DPP is noted (PANGANIBAN et al. 1990). The first cysteine of the C-terminal region (102 amino acids) diagnostic for the TGF-P family is indicated by the designation ligand. Variability in the location of cleavage sites between different TGF-P family members results in significant amino acid divergence before the first cysteine. Only the TGF-0 diag- nostic region is considered as the ligand in the analysis.

viewed in MASSAGUE 1990). The pattern of amino acid conservation seen for DPP suggests that this function resides only in Domain 2 of the proregion.

The analysis of amino acid substitutions suggests that Domain 1 and Domain 3 are under significantly lower levels of selective constraint. This proposal is supported by analysis of the nucleotide sequences underlying amino acid length variation in the alignment (Table 2).

Page 6: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

302 S. J. Newfeld et al.

TABLE 1

Comparison of DPP amino acid sequences

Percent identity" Percent similarityb

Overall Proregion Ligand' Overall Proregion Ligand

mel/sim 99.5 99.3 100 99.7 99.6 100 mel/pse 82.1 79.0 97.1 85.1 82.5 97.1 mel/vir 66.1 61.4 88.2 71.3 67.0 91.2 sim/pse 82.3 79.2 97.1 85.2 82.7 97.1 sim/vir 65.8 61.0 88.2 70.9 67.2 91.2 pse/vir 70.6 66.7 90.2 73.6 69.7 93.1

"To calculate the percent identity, amino acids opposite a gap in either species were ignored. *Percent similarity is the sum of the percent identify and the percent of conservative substitutions between

the listed species. Conservative substitutions are based upon the biochemical similarity of the amino acids: D/E, K/R/H, N/Q, S/T, I/L/V, F/W/Y, and A/G (SMITH and SMITH 1990). ' The location of the first cysteine in the Gterminal TGF-P diagnostic ligand region is indicated in Figure 3.

Of the nine insertion/deletion events in the alignment, only two do not involve nucleotide repeats. Triplet re- peats are the most common reason for gaps in the align- ment. Four of the insertion/deletion events are length variation in Opa repeats (CAX; WHARTON et al. 1985) in one or more species encoded in all three reading frames. All of the Opa based insertion/deletion events fall in the highly variable Domain 1 (see Figure 4). The amino acid insertion in Domain 3 is an expansion of a GGX repeat, encoding polyglycine, in D. pseudoobscura. Stretches of homopolymers encoded by Opa repeats and regions of polyglycine have been proposed to act as flexible spacers in proteins (BEACHY et al. 1985; NEW- FELD et al. 1993, 1994). These repetitive sequences may undergo much more rapid change than nonrepetitive regions due to the higher frequency of mutational pro- cesses such as slippage replication and unequal ex- change (DOVER 1986).

Examination of the pattern of codon usage at DPP in each species (Table 3) shows that D. melanogasterand D. simulans are extremely similar. Both species show a very strong bias toward G/C at the third position for all 18 amino acids with multiple codons. The strong G/ C bias seen for all codons in D. melanogaster and D.

simulans DPP is very similar to the codon bias calculated from 20 D. melanogaster peptides (LEWONTIN 1989). D. pseudoobscura largely reflects this pattern but has a re- duced level of G/C bias for eight of these amino acids (F, T, A, H, Q, N, K and D). In contrast, D. virilis shows an A/T bias for eight amino acids (F, I, Y, T, H, Q, N, and D). Interestingly, three of the amino acids showing both reduced G/C bias in D. pseudoobscura and A/T bias in f). vin'lis are encoded by CAX in either frame 1 or 3. Examination of codon usage in D. virilis for these three amino acids in the highly repetitive develop- mental gene mastermind reveals the same A/T bias for H and N but Q shows a very strong G/C bias (NEWFELD et al. 1993). However, in mastermind CAX is twice as likely to encode Q than H or N while in dpp CAX en- codes all three amino acids at roughly equal frequency. This suggests that amino acid frequency may affect co- don usage patterns, particularly in triplet repeats.

Recently, WHARTON et al. (1996) reported the amino acid changes in six dpp mutations in D. melanogaster. Five of these are simple missense mutations. In our alignment, the wild-type amino acid at those positions is invariant. However, the recessive lethal allele dpPhrj6, which has alterations in amino acids 397-400, is a com-

I II 111 IV v VI VI1 IX

I I Domain 1 1 Domain 2 IDomain 4 t

signal proregion cleavage site llgand sequence

'0 '1 00 '200 '300 '4w '500 588'

FIGURE 4.-Schematic representation of the DPP amino acid alignment. The coordinate line represents the amino acid sequence of D. melanoguster DPP. The wide rectangle depicts important structural features of DPP and the proregion Domains displaying distinct modes of amino acid variation (see text). Vertical lines along the top of the figure represent the location of amino acid substitutions in any of the other three species. Short vertical lines indicate a substitution in any one of the other species, intermediate length lines indicate a substitution in any two of the other species, tall lines indicate a substitution in all three species. Roman numerals represent insertion/deletion events identified in the alignment (Figure 3) which are described in Table 2.

Page 7: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

dpp Molecular Evolution 303

TABLE 2

Insertion/deletion events greater than a single codon in the dpp coding region

Consensus nucleotide Insertion" Locationb Species Size (bp) sequence

I 25 Pse 6 Nonrepetitive I1 52 me1 18 [TCAGGA] X 3

sim 24 [TCAGGA] X 4 Pse 12 [GCX]' X 4

I11 79 vir 30 [AAC] X 10 (Opa: frame 3)" rv 83 Pse 63 [AAC] X 21 (Opa: frame 3)

V 128 sim 9 [AAC] X 3 (Opa: frame 3)

VI 142 Pse 15 [CAG] X 5 (Opa: frame 1)

VI1 188 vir 18 Nonrepetitive VI11 330 me1 15 [CAX] X 2, [GTG] X 2, GCC

sim 15 [CXG] X 2, [GTG] X 2, GCC

vir 30 [AAC] X 10 (Opa: frame 3)

vir 9 [ACA] X 3 (Opa: frame 2)

vir 15 [CAA] X 5 (Opa: frame 1)

Pse 15 CAG, [GXG] X 3, GCT IX 457 Pse 27 [GGX] X 9

a Insertion numbers derive from Figure 3. The number of the last amino acid in D. melanogaster before the insertion/deletion event. X represents any nucleotide.

"The Opa nucleotide consensus is [CAX] as described in WHARTON et al. (1985).

plex mutation in which an H replaces four amino acids (QRNY). At position 398 in D. uirilis and D. pseudo- obscuru, a conservative substitution of K for R is observed but no length variation is detected. As K, R and H are basic amino acids, our alignment raises the possibility that it is not the presence of H at position 397 but the loss of the other three amino acids that is responsible for the dpPhr56 mutant phenotype.

Analysis of the nucleotide alignments underlying the dpP amino acid alignment provides information on the

rates of synonymous ( K ) and nonsynonymous ( I Q sub- stitutions per site between species (Table 4). In theory, synonymous substitutions are thought to be largely neu- tral while the majority of nonsynonymous substitutions are subject to selection. The ratio of these two events provides a measure of the strength of the selective forces acting on an amino acid sequence (RILEY 1989). Large &/& ratios indicate strong selection against non- synonymous substitutions while small ratios indicate re- duced selection on the amino acid composition of the

TABLE 3

Comparison of DPP codon usage

m s " p u

T l T p h e F 5 5 6 9 TTC phe F 11 11 8 7 TTA leu L 3 3 4 7 TTG leu L - - 4 2

CTT leu L 3 2 2 2 CTC leu L 12 12 13 15 CTA leu L 3 3 5 4 CTG leu L 25 26 21 18

ATT ile I 6 5 3 7 ATC ile I 15 16 14 11 ATA ile I 2 2 6 11 ATG met M 8 8 8 9

m s p v

TCT ser S 5 4 2 - TCC ser S 10 11 7 13 TCA ser S 5 6 4 2 TCG ser S 13 12 8 8

m s p v m s p v

TAT tyr Y 2 2 4 10

TAG AMB Z 1 1 1 - TGG trp W 5 5 5 5

TGC cys C 5 5 5 4 TAC tyr Y 11 11 11 6 TGT cys C 2 2 2 3

TAA OCH Z - - - 1 TGA OPA Z - - - -

CCT pro P 3 2 5 1

CGG arg R 18 18 15 10 CAG gln Q 17 17 20 12 CCG pro P 15 17 11 14 CGA arg R 6 6 3 3 CAA gln Q 7 6 13 17 CCA pro P 5 5 9 8 CGC arg R 13 13 11 11 CAC his H 21 23 14 6 CCC pro P 12 12 11 8 CGT arg R 1 1 2 3 CAT his H 7 4 11 18

ACT thr T 2 2 5 7 AAT asn N 8 6 14 23 AGT ser S 10 10 9 9 ACC thr T 11 11 10 8 AAC asn N 18 20 30 21 AGC ser S 17 17 11 25 ACA thr T 6 6 14 27 ACG thr T 12 12 12 11

AGA arg R 5 5 3 7 AAA lys K 12 14 19 23 AGG arg R 2 1 6 4 AAG lys K 33 31 25 25

GTT val V 4 4 5 8 GTC Val V 9 9 8 6

GGT gly G 3 1 5 4 GAT asp D 7 8 16 20 GCT ala A 3 3 5 2

GTG Val V 29 29 25 23 GCG ala A 14 14 22 15 GAG glu E 21 22 22 16 GGG gly G 3 3 6 3 GGA gly G 7 8 3 2 GAA glu E 4 4 8 10 GCA ala A 9 9 18 10 GTA Val V 2 1 1 1 GGC gly G 12 14 16 14 GAC asp D 28 27 19 13 GCC ala A 20 20 17 13

Excluding the five additional amino acids aligned with gaps I1 and V in D. melanogaster (Figure 3).

Page 8: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

304 S. J. Newfeld et al.

TABLE 4

Number of synonymous ( K , above diagonal) and nonsynonymous ( K , below diagonal) substitutions per site in the dpp coding region

Overall" Proregionb Ligand

me1 sim Pse vir' me1 sim Pse vir" me1 sim Pse vir" ~ ~~

me1 - 0.0363 0.5835 1.0064 - 0.0299 0.5761 0.9221 - 0.0196 0.6248 1.0878 sim 0.0027 - 0.5803 0.9993 0.0035 - 0.5767 0.9005 0 - 0.6248 1 .OS78 pse 0.1088 0.1059 - 0.7053 0.1327 0.1285 - 0.6939 0.0149 0.0149 - 0.5573 vir' 0.1378 0.1344 0.1127 - 0.1547 0.1533 0.1288 - 0.0717 0.0717 0.0563 - ' Before the calculation of K, and K,, gaps in the alignment were removed. * Before the calculation of K, and K,, gaps in the alignment and Domain 3 were removed. Based upon a modified alignment for pairwise comparisons with D. virilis that used numerous small gaps in both species to

maximize amino acid identity in highly degenerate regions (e.g., D. virilis amino acids 101 -161). This alignment is available upon request.

encoded protein. For dpp, each pairwise comparison (Table 5) generates a large &/& ratio in the ligand region while the proregion gives a small &/& ratio. The overall &/& ratio is also small, reflecting the rela- tive sizes of the ligand and the proregon (see Figure 4). These results are consistent with data on DPP amino acid conservation (Table 1) that suggested strong selec- tive constraint on the amino acid composition of the ligand and a lower level of constraint on the proregion.

It is valuable to compare the D. melanogaster/D. pseu- doobscura &/& ratio for dpp with other genes analyzed in these two species (12 loci; described in WELLS 1995). The dpp ligand region has the second largest &/& ratio (41.9342), falling between glycerol-3-phosphate dehy- drogenase (119.3636) and heat shock protein 82 (33.6264). However, the proregion I&/& ratio (4.3398) and the overall &/& ratio (5.3630) are the lowest of all genes compared among these species. The genes with the most similar l&/& ratios are amylase (5.9366) and the anterior-posterior morphogen bicoid (7.2760). In calculating these values for dpp, we attempted to maximize the %/& ratio by eliminating the large num- ber of nonsynonymous substitutions which occur in hypervariable Domain 3. The extremely low I&/& ratio of the proregion contrasts with the 82.5% amino acid similarity the proregion displays between these two spe- cies. The discordance between these two measures sug-

TABLE 5

Ratio of synonymous (lQ to nonsynonymous ( K ) substitutions per site in the dpp coding region

Overall Proregion Ligand

mel/sim 13.2783 8.4492 Coa

mel/pse 5.3630 4.3398 41.9342 mel/vir 7.3047 5.9624 15.1710 sim/pse 5.4820 4.4891 41.9342 sim/vir 7.4343 5.8756 15.1710 pse/vir 6.2563 5.3891 9.8938

a No amino acid substitutions were detected in this compari- son, therefore & = 0.

gests that it is not an excess of nonsynonymous substitu- tions that causes an extremely small &/& ratio but a distinct lack of synonymous substitutions.

This hypothesis is further supported when the D. mel- anogaster/D. pseudoobscura estimates of & and & for dpp are compared with other genes analyzed in these two species. For dpp, the & value for the ligand is slightly higher than for the proregion and the overall protein, though the numbers are quite similar (ligand 0.6248, proregion 0.5761, overall 0.5835). The overall value for dpp is the second lowest & reported between D. melane gaster/D. pseudoobscura, only amylase (0.4025) has fewer synonymous substitutions per site. Alternatively, the proregion and overall & values for dpp are among the highest reported (proregion 0.1327, overall 0.1089). Only bicoid (0.1395), pupal cuticle protein (Cart; 0.1616) and esterase-5 (0.1778) have higher values. In addition, the values for these proteins with high & values are roughly double the & values for dpp. Thus, it appears that synonymous substitutions at dpp are not neutral and that there is selective constraint on all nu- cleotides encoding DPP. The existence of high levels of natural selection on the nucleotide composition of the dpp transcription unit is clearly shown by examining nucleotide sequences immediately downstream of the DPP stop codon.

Molecular evolution of dpP 3' untranslated se- quences: Alignment of transcribed sequences 3' to the DPP translation termination codon revealed a surpris- ing amount of identity among the species (Figure 5). Exactly 70 bp downstream of the stop codon in D. melu- nogaster, D. simulans and D. uirilis (103 bp in D. pseudoob- scura) is an extraordinary run of 106 invariant nucleo- tides, excepting a single nucleotide gap in D. uirilis (box 1). Further downstream there are two other regions of nucleotide conservation (boxes 2 and 3) though these are not nearly as striking. No known dpp mutations map to the 3' untranslated region and experiments using reporter genes have demonstrated that this region does not affect dpp transcription (JACKSON and HOFFMANN 1994). Attempts to model dpp RNA secondary structure

Page 9: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

dpp Molecular Evolution 305

3 3 33 66 33

104 104 137 104

175

208 175

174

212 211 276 236

272 271 326 307

335 334 390 375

398 397 402 444

462 461 454 504

533 529 500 534

................................................... sin -. ....... .. y e ,..... A T . ~ ~ C T C ~ G T O T ~ G ~ . ~ C C A A ~ G A . . O A A T . . A G

T.

ACAG.CA.TGA.AD

....... TOT.GACAGT..GAGTC.CA.TC.A .. ~ . 0 1 1 0 . 0 1 \ O A O C W L . . G ~ G A . A . . G . D A . A . . .

C G T C A A T G c p m A A A c C A A C ~ ~ C R A A A T O ~ ~ T A ~ ~ ~ T O T ~ T C G ~ A G A ~ ....................................................................... ....................................................................... ......................................................................

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ . ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ " " " ~ ~ " " " " "

... TT C.AGTAT.TAC.TAT .. AT ...... GATACCCCACAACCCACCCCCCCTGCCCCCCTO TT .... GT .. AGCGA.---------AT ...... ACGCCCACCCAAACTACCCCACCATCCTCCCCCG

.....".......".""""""""~

-. C--------------CC ... GC .. CCA..AGCAGCP.GCA.CAGG. .G..GC.G....CAACAAGCAGCA.CCGCC. AC ..... A.

G ~ T C C C ~ T C A G ~ ~ A ~ ~ - - A ~ - - - - - - ~ G C A ~ G T ~ C T G T A T A T

.. ........ CAGA~CCADCDAOA------.A. ....... A ...... A . . TGTT.CA ...... C .. ACG.GTTG...CGCCGC...ACC.AC.A ...... A . .

AGCCGCCATGCCACGCCCC-CA."""""""""""""""""""""""

.. ....................

........................ ............................................

_............""""""""""""""""" .............

G..C.T.TACCACCCC~CCCAATACTACCCC~CCAT~TTTOTCGCC~CT

-CAGATA---GTCCCCDCCCATCCGCCAGA-TA~TCA~TATT--AGATACTTTCGTATCGCGCT ....... ". ................ c . . . . .... G ... CC .- TA .. AG -. AGT . . A..T.CCATAWL.C.CAC..TO.........T......---------

A . .C CC.CCA.C.A .. AG ... C.A .... CC-G..CA.C..CCACC-. ..

~ ~~ ~

ATC~G.A.AC.CAG .. GC.GCT.A.GAG .... c

.GC.CGC.CT.CTG

. . . . . . . . . . . . . . . G... ......

FIGURE 5.-Alignment of dpp 3' untranslated sequences. The nucleotide sequences that immediately follow the stop codon terminating the DPP open reading frame are aligned. The stop codon is underlined. D. melanogaster nucleotide 1 is Hin and shv region contig nucleotide 16087. Nucleotides are numbered consecutively for each species and indicated at the left margin. Dashed lines indicate gaps required to demon- strate maximum identity between the sequences. A period indicates an identical nucleotide in that species and D. mlanc- gaster. The alignment ends near the last nucleotide of the D. simulans Hin region contig. Regions of extensive sequence conservation (longer than 25 bp in which 285% of the nucle- otides are invariant) are boxed.

to identify known regulatory motifs were inconclusive. Database searches using BLASTN (ALTSCHUL et al. 1990), which compares a nucleotide sequence with the nucleotide sequence databases, identified no similar se- quences. However, numerous developmental genes contain regulatory sequences in the 3' untranslated re- gions of their mRNAs. For example, these sequences can control the subcellular location of transcripts, mRNA stability or regulate mRNA translation (CURTIS et al. 1995; ST. JOHNSTON 1995).

Molecular evolution of dpp intronic sequences: Ge- netic analysis in D. melanogaster has identified two types of regulatory element in the large intron separating exons 2 and 3. There are enhancers that promote ex- pression of dpp in dorsal and terminal regions during the blastoderm stage as well as dorsal and lateral regions during later stages of development. There are also re- pressors that prevent dpp expression in ventral regions during the blastoderm stage (JACKSON and HOFTMANN 1994; R. PADGETI and W. GELBART, unpublished obser- vations). The repressor elements have been molecularly characterized in D. melanogaster, by in vitro DNAse foot-

printing assays, as binding sites for the DORSAL protein (DL; HUANG et al. 1993).

Alignment of the four Drosophila sequences shows a section of extensive conservation in the center of the intron. In this -800-bp segment, -500 bp from each exon, there are numerous stretches of invariant nucleo- tides. This region of nucleotide conservation is shown in Figure 6. The majority of the DL binding sites (nine of 14; boxes labeled S or M) are located here and the DL binding sites outside this region are not conserved. Even within this region, there is variability in the conser- vation of DL binding sites. For example, sites S6 and M4 are invariant in all species (excepting one nucleotide in D. virilis) while sites S1 and S7 appear almost completely degenerate in both D. pseudoobscura and D. virilis. In addition to DL binding sites, which likely have repressor activity, there are many other stretches of invariant nu- cleotides (boxes 1-5) that may correspond to im- portant enhancers. The alignment of nonconserved segments of this region, as well as the portions of the intron not shown in Figure 6, are characterized by large insertions and deletions. These are due to extended dinucleotide and trinucleotide repeats in any one of the species.

Similar results were obtained in a preliminary com- parison of sequences from the nontranscribed disk re- gion. Using in vitro DNAse footprinting techniques, SANICOLA et al. (1995) detected three ENGRAILED (EN) binding sites in a subclone from a region that was known to regulate aspects of dpp expression in imaginal disks (BS 1.1; BLACKMAN et al. 1991). EN acts to repress dpp expression in the posterior compartment of disks and mutational analyses demonstrated that these three binding sites participate in that function in vivo (SANI- COLA et al. 1995). Examining the sequence of the com- plete BS 1.1 fragment (98EcoRI in Figure 1B) identified a total of nine consensus EN binding sites. Comparison of sequences from this regon in D. melanogaster with the homologous region in D. pseudoobscura revealed that the EN binding sites that footprinted in D. melanogaster (EN1, EN2 and EN3) are not well conserved. However, other consensus EN binding sites that did not footprint in D. melanogaster (EN4A and EN4B) are identical in both species (M. SANICOLA and W. GELBART, unpub- lished observations). As in the intron, it is possible that the consensus EN sites which are conserved but do not footprint in D. melanogasterrepresent as yet uncharacter- ized regulatory elements.

DISCUSSION

Secreted signaling proteins of the TGF-P family are essential for proper development in a wide range of species. The instructions contained in TGF-P ligands affect the growth and differentiation of responsive cells. The TGF-P family member DPP is used many times during the Drosophila life cycle. Our laboratory and

Page 10: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

306 S. J. Newfeld et al.

76 76 76 73

155 156 134 147

223 232 193 209

276 285 274 289

359 368 357 372

442 451 440 455

467 476 520 538

543 552 602 620

626 635 685 703

685 694 740 786

768 776 800 844

sim . . . . C ....... . . . . . . . . . . . . . . . . ................ ... C . . . . . . . . . . . .

C- G... ..

........... ACCGATCC ......................... - - - - - - - """-"""----"""- GTGC ........ G.AGCGATATA--------TTTTTGGGC . . . . . . . . . . . . . .

C A I I A A T G A A C A G - - - - - - - - - - - - - - - - - - - - - C T A A A A -"--"""""""" . . . . . . . . . . . . - - - - - - - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " . . . . . . . . . . . .

. . . . . . . . . . . . AACGCCGGAGAGGGTAGCAGG.G . . . . . . . . . GACAGTGGATAG . . . . . . . . . . . . . . CGGG...TT--.G A.TTCC.C.GCATTCAAATGAAAGGTCGCCAGG.G ... G . . . . . T-"GCGCCGTGTC . . . . . . . . . . . AGACGT.. TTCACT

s5 BOX 2A GACAGCTCGGCAGTGACC~AAATTTTTCCTAAG~TTACAC~GGACCGTTTAGCGCATGCGC~~TTTTATATT~

S6

. . . . . . . . . . . . . . . . . . . . . . . . . . . CCGT .. C.AAA G.A ..... AACG . . . . . . . . . .

.G. .I.. CTTAA.C .AAAT .................. TACG .. I........C...AC...............C.................

2B M4 2c M5 2D ATTTGTAGGAATAACAATTATTATTAAAGCG TTCGCATTATTTTCC AGGA~ATAGGGTCGAACAATMG~AGTT .................................................. ................................ ......... A ................... .G ................................................

G .... A..G............... A.................... ........ G, ... AT ................. ........

GGTGCAGGTAAGCAAGCCGCACAAA------------------------------------------------------------ c... -"-----""""""""-~"""~"~"""~"""""""~ . . . . . . . . . . . . . . . . . . . . .

.AACAG..C.G..T..G.CGGT..CAGCAAACAGGTAACCAATCCAGTCCAATCCAGTCCAGTCCAAGCCAACAAACCGG---

.. CC ... A..GEAE.EA~~A;.EAEAEGEAEAEAE,'AEA~A~AE,'A~A~AEA~A~AETEAEAEA~A~AGTTGGCTGATCAGGTA

"""

"""

-CATGG. . . GT.T.CC . . G.CTAC.C .. CA.T .. TG. ..AC..ATT.. . . . CA..CGAG..CGTC.C.G.CTCTG.CTT..TCTCTGTGA

AG .. CG.AT.CG..TGCGGC ... G.G ...

A A A A G A A T T T A C G A C T C G C T T G G C T C T r P G T G C A A G G T A C .................... c TTGT.TGCACCACT.GA.AG . . . TAAGA .. GTAGCT.CTGCG.C T.GA.A . . . . TGGTACT.GAGTTTTCTT ... ATA.T ... CCATTTTCCGTTTTTTTTTTTGTTGTTGTTGTTGTCGCTGGTCT.TTG.TG.T

G

TGC T.C.G CACCTT.GC AC.TTGCG.----------------------- .A A...........GT...............

. . . . . . . . . G.TCGGA-----------------------

ATATAGACAGATCCGAllAAGGTAGGCCG-GTGTGGGCCA TCTTTAGCACAGGCGCCAGTCGTAAAAAGGCGGC --- ..............................................................................

-. CGGAT.G. . . . . TC . . . . . . . . . . . T ...................................... _"

A-.G CC.GCCG.T.CGGATCGT.CCCT.CGTGT.G . . . . . T.A- . . . . . . . T.. ..................... T ..... C. CAA

_"

RGURE 6.--A)ignment of dpp sequences from the intron separating exons 2 and 3. The nucleotide sequences beginning 467 bp downstream of the exon 2 splice donor in D. mhmgmterare aligned. D. mlmzogasternucleotide 1 is Hin and shv region contig nucleotide 13941. No regions of extensive nucleotide conservation are noted in the intenal between the splice donor and the beginning of the alignment. The last D. mlamgusternucleotide is 14787 of the Hin and shv region contig (403 bp upstream of the exon 3 splice acceptor). Nucleotides are numbered consecutively for each species and indicated at the left margin. Dashed lines indicate gaps required to demonstrate maximum identity between the sequences. A period indicates an identical nucleotide in that species and D. mhnogmtm The boxed regions labeled S1 through S7 and M4 and M5 represent DL binding sites identified by DNAse footprinting (HUANG et al. 1993) in D. mlanogmtm Additional DL binding sites (M1 through M3 which fall between the splice donor and the alignment and M6 and S8 which are between the alignment and the splice acceptor) reported by HUANG f? al. (1993) are not conserved. Regions of extensive sequence consenration (longer than 20 bp in which 285% of the nucleotides are invariant) which are not part of DL binding sites are boxed. Box 2 (68 invariant nucleotides of 76) has four parts (a, 2B, 2C and 2D) that are separated by the DL binding sites S6, M4 and M5. The only region of extensive nucleotide conservation in the interval between the end of the alignment and the splice acceptor (20 invariant nucleotides of 21) begins at D. mlanogaster Hin and shv region contig nucleotide #14790. Extended runs of dinucleotide and trinucleotide repeats in the alignment are indicated by dashed underlines.

Page 11: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

dpf Molecular Evolution 307

others have recently identified several of the proteins responsible for communicating the instructive content of the DPP signal to the nucleus of appropriate cells. A number of these proteins including the transmembrane receptors encoded by saxophone (BRUMMEL et al. 1994) and punt (LETSOU et al. 1995) and the cytoplasmic pro- tein MAD (SEKELSKY et al. 1995; NEWFELD et al. 1996) are expressed in all cells. Further, it appears that the ubiquitously expressed MAD protein translocates from the cytoplasm to the nucleus in response to DPP signals ( S . NEWFELD and W. GELBART, unpublished observa- tions). These findings suggest that the restriction of the information contained in DPP signals occurs largely through the control of ligand expression. An extensive network of developmental controls appears to have evolved to modulate the complex pattern of d@ gene expression. In this report and RICHTER et al. (1997), we examine the d@ locus for evidence of selective con- straint on the developmental control of d@ function.

Our comparison of nucleotide and amino acid se- quences from the d@ locus of four Drosophila species suggests that there are four distinct types of regulation necessary for proper d@ function. The analyses detected strong selective constraints on the amino acid content of two distinct sections of the DPP protein. Conservation of the ligand region is essential to maintaining interac- tions with the appropriate receptors. The conservation of Domain 2 in the proregion (which does not interact with receptors) suggests the existence of posttransla- tional regulation. The identification of large blocks of invariant nucleotides in the 3‘ untranslated region of d@ exon 3 and the lack of synonymous substitutions in the coding region strongly implies that posttranscrip tional regulation is important. Finally, the existence of large blocks of invariant nucleotides in an intron and in the nontranscribed disk region identifies important sequences in the complex transcriptional control appara- tus suggested by the genetic and expression studies.

The highest level of amino acid conservation is ob- served in the ligand region. This region differs by < 10% (including conservative substitutions) even be- tween the most distant species. It is informative to com- pare this with the amino acid similarity seen between the three TGF-P family members in D. melanogaster. The ligand regions of DPP and 60A show 62% amino acid similarity, DPP and SCREW are 56% similar, and 60A and SCREW show 69% similarity (ARORA et al. 1994). Though each gene is genetically and developmentally distinct from each other, the three paralogous genes are extremely similar in the ligand region. These genes are nearly as similar to DPP as DPP is to its human homologues BMP2 (74% similar) and BMP4 (76% simi- lar), which are able to substitute for DPP in vivo (PAD- GETr et al. 1993). One possibility is that selection is actively constraining the DPP amino acid sequence, thereby limiting cross-interactions of the DPP ligand with the receptors for 60A and/or SCREW.

The analysis of amino acid conservation also shows that there are distinct levels of constraint on different parts of the proregion. Domain 1 (excepting the signal sequence) and Domain 3 of the protein must be under reduced selective constraint. These two segments display extensive divergence between the species, in the form of amino acid substitutions and amino acid length variation. In this light, we note that DPP’s human homologues BMP2 and BMP4 are significantly shorter than DPP. The shortest DPP in our Drosophila study is D. melunogaster at 588 amino acids while BMP2 is 397 amino acids and BMP4 is 409 amino acids (WOZNW et al. 1988). In a comparison of D. mlunogaster DPP with BMP2 and BMP4, the highly variable Domain 1 is completely absent in the human proteins. The first reliable region of alignment between the three proteins (excepting the signal sequence) is at amino acid 45 of BMP2, amino acid 50 of BMP4 and amino acid 219 of D. mlunogastm DPP. The analysis of nucleotide polymorphism in D. melunogaster DPP re- ported by RICHTER et al. (1997) also implies that this region of the protein is under reduced selection.

Domain 3 also displays high levels of amino acid sub- stitution and amino acid length variation, and we pro- pose that this short nonconserved region may act as a physical spacer insuring that the core of the ligand re- mains unaffected by the cleavage reaction. Again look- ing to DPP’s human homologues for comparison, the distance between the cleavage site and the first cysteine of the ligand in both BMP2 and BMP4 is 11 amino acids shorter than in the distance in D. melunogasterDPP. Even within insects there is great variability in the length of Domain 3. This region is 13 amino acids shorter in grasshopper DPP (NEWFELD and GELBART 1995). These comparisons with distant species are completely consis- tent with the results from our interspecific and the in- traspecific studies of RICHTER et al. (1997) suggesting that Domain 1 and Domain 3 of the protein are under reduced levels of selective constraint. It is interesting to look beyond the DPP/BMP subfamily (postulated to be the most ancient TGF-P subgroup due to their presence in vertebrates and invertebrates) to other TGF-P family members. For example Domain 3 of TGF- Pl (for which no invertebrate homologue is known) is 14 amino acids shorter than in D. melanogaster DPP (DERYNCK et al. 1985). Thus, it appears that these two hypervariable sections of Drosophila DPP (Domain 1 and Domain 3) have been expanded in the invertebrate lineage or eliminated in the vertebrate lineage.

Suggestions about the function of the conserved Do- main 2 of the DPP proregion derive from studies of TGF-P1. In vitro studies show that the proregion is in- volved in ligand dimerization and secretion and follow- ing secretion the proregion remains associated with the ligand possibly serving as a functional regulator of li- gand activity (reviewed in MASSAGUE 1990). However, no similar observations have been reported for DPP.

The identification of large blocks of invariant nucleo-

Page 12: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

308 S. J. Newfeld et al.

tides in the 3’ untranslated region and the reduced amount of synonymous substitutions in the coding re- gion strongly suggest that posttranscriptional regula- tion is also important for dpp function. However, no evidence is available from studies of other TGF-P family members to suggest what form this regulation might take. RNA in situ experiments (e.g., RAY et al. 1991) demonstrate that dpp mRNA is very short lived. Thus far, no connections between mRNA turnover and any dpp sequences have been made.

The aspect of dpp regulation for which the most mo- lecular evidence exists is transcriptional. To date, three molecular interactions are well characterized in D. mela- nogastm activation by Ultrabithorax in the embryonic midgut (MANAK et al. 1995), repression by DL during embryonic development (HUANG et al. 1993) and re- pression by EN during imaginal disk development (SAN- ICOLA et nZ. 1995). Our observations of DL and EN bind- ing sites, that there are many more blocks of invariant nucleotides than are accounted for by in vitro DNAse footprinting experiments and that not all sites identi- fied by in vitro footprinting are conserved, suggest an extremely complex regulatory system.

The first observation is consistent with reporter gene studies indicating that there are also enhancer elements located in the intron (JACKSON and HOFFMANN 1994; R. PADGETT and W. GELBART, unpublished observations). The second observation suggests that not all sites that show in uitro footprinting are of equal importance in vivo. Perhaps the DL and EN binding sites that footprint but are not conserved are those used in D. mlanogaster only when another binding site has been affected by mutation. If we were to footprint D. pseudoobscura or D. vim’lis with DL or EN, maybe we would find distinct binding sites that were also not conserved. Mutations in existing binding sites, occurring independently in each Drosophila lineage after speciation, could account for the existence of distinct sets of nonconserved bind- ing sites in each species.

From this perspective, we propose that a level of built in redundancy allows proper regulation of dpp tran- scription in spite of a mutation in any specific binding site. A population study of even-skipped in D. melanogaster (LUDWIG and KREITMAN 1995) also concluded that binding site redundancy is an important aspect of txan- scriptional regulation. In other words, we suggest that evolution of regulatory sequences can occur in the ab- sence of changes in gene expression. This model could be tested by determining if dpp disk expression is similar in D. melanogaster and D. pseudoobscura, identifymg EN binding sites in D. pseudoobscura that differ from those in D. rnelanogaster and then evaluating the ability of the nonconserved D. pseudoobscura EN binding sites to affect proper dpp expression in D. melanogaster. The model predicts that nonconserved D. pseudoobscura EN binding sites will accurately affect dpp expression. One corollary of this hypothesis is that when inferring functional con-

straint from sequence comparisons one cannot assume that nonconservation is evidence of unimportance.

In summary, our interspecific analysis of the dpp locus has identified significant selective constraint on the mo- lecular evolution of both nucleotide and amino acid sequences. Our interpretation of the data is that there are four types of regulation required for proper dpp function, not all of which were previously known. This complex set of developmental controls appears to have evolved to prevent any deviations in dpp expression pat- tern (spatial, temporal or quantitative) and activity which could lead to significant phenotypic change.

We thank the many members of the GELBART laboratory who have contributed to this long-term sequencing project. In particular, we thank RON BLACKMAN for the genomic sequence of D. melunoguster dpp exon IB, for mapping the 5’ end of transcript B and for valuable comments on the manuscript. WAYNE RINDONE and SUSAW Russo assisted with the computational analyses. We are grateful to DICK LEWONTIN for many valuable discussions. This work was supported by grants from the National Institutes of Health to W.M.G.

LITERATURE CITED

ALTSCHUL, S. F., W. GISH, W. MILLER, E. MYERS and D. J. LIPMAN, 1990 Basic local alignment search tool. J. Mol. Biol. 215: 403- 410.

ARORA, K, M. LEVINE and M. O’CONNOR, 1994 The screw gene en- codes a ubiquitously expressed member of the TGF-P family required for specification of dorsal cell fates in the Drosophila embryo. Genes Dev. 21: 2588-2601.

BEAcHY, P., S. HELFAND and D. HOGNFS, 1985 Segmental distribu- tion of bithorux complex proteins during Drosophila develop ment. Nature 313: 545-551.

BIACKMAN, R., and M. MESELSON, 1986 Interspecific nucleotide se- quence comparisons used to identify regulatoly and structural fea- tures of the Drosophila hp82 gene. J. Mol. Biol, 188: 499-515.

BLACKMAN, R., M. SANICOLA, L. A. RAFTERY, T. GILLEVET and W. M. GELBART, 1991 An extensive 3’ cis-regulatory region directs the imaginal disk expression of decupentapkgzc, a member of the TGF- B family in Drosophila. Development 111: 657-666.

BRUMMEL, T., V. TWOMBLY, G. MARQUES, J. WRANA, S. NEWFELD et al., 1994 Characterization and relationship of Dpp receptors encoded by the saxophone and thick veins genes in Drosophil. Cell 78: 251-261.

CAVENER, D. and S. RAY, 1991 Eukaryotic start and stop translation sites. Nucleic Acids Res. 19: 3185-3192.

CHII.DS, S. R., J. L. WRANA, K ARORA, L. ATTISANO, M. B. O’CONNOR et al., 1993 Identification of Drosophila activin receptor. Proc. Natl. Acad. Sci. USA 9 0 9475-9479.

CURTIS, D., R. LEHMANN and P. ZAMORE, 1995 Translational regula- tion in development. Cell 81: 171-178.

DAOPIN, S., K. A. PIEZ, Y. OGAWA and D. R. DAVIES, 1992 Crystal structure of TGF-@2: an unusual fold for the superfamily. Science 257: 369-373.

DERYNCK, R., J. JARRETT, E. CHEN, C. EATON, J. BELL. et ul., 1985 Human transforming growth factor-beta cDNA sequence and expression in tumor cell lines. Nature 316: 701-705.

DEVEREUX, J., P. HAEBEKI and 0. SMITHIES, 1984 A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12: 387-395.

DOVER, G., 1986 Molecular drive in multigene families: how biologi- cal novelties arise, spread and are assimilated. Trends Genet. 2:

FINELLI, A,, C . BOSSIE, T. XIE and R. P A D G E ~ , 1994 Mutational analysis of the Drosophila tolloid gene, a human BMP-1 homolog. Development 120: 861-870.

GISH, W., and D. J. STATES, 1993 Identification of protein coding regions by database similarity searches. Nat. Genet. 3: 266-272.

HOFFMANN, F. M., and W. GOODMAN, 1987 Identification in transgenic animals of the Drosophila decapentaplzgic sequences

159-165.

Page 13: Molecular Evolution at the decupentupkgic Locus in DrosophilaFIGURE 1.-Genetic and physical map of the dpp locus. (A) Schematic map of the D. melanogaster dpp locus with molecular

dpp Molecular Evolution 309

required for embryonic dorsal pattern formation. Genes Dev. 1:

HUANG, J.-D., D. SCHMIYTER, J. SHIROKAWA and A .J. COUREY, 1993 The interplay between multiple enhancer and silencer elements defines the pattern of decapentaplegic expression. Genes Dev. 7: 694-704.

HURSH, D. A,, R. W. PADCETT and W. M. GEI.BART, 1993 Cross regu- lation of decapentaplegic and Ultrabithorax transcription in the em- bryonic visceral mesoderm of Drosophila. Development 117: 1211-1222.

IMMERGLUCK, K., P. LAWRENCE and M. BIENZ, 1990 Induction across germ layers in Drosophila mediated by a genetic cascade. Cell

IRISH, V. F., and W. M. GEIBART, 1987 The derapentapkgzr gene is required for dorsal- ventral patterning of the Drosophila embryo. Genes Dev. 1: 868-879.

JACKSON, D. P., and F. M. HOFFMANN, 1994 Embryonic expression patterns of the Drosophila derupentuplegic gene: separate regula- t o 7 element? control blastoderm expression and lateral ectoder- mal expression. Dev. Dynamics 1 9 9 28-44.

KINGSI CY, D., 1994 The TGF-p superfamily: new members, new re- ceptors, and new genetic tests of function in different organisms. Genes Dev. 8: 133-146.

JSREITMAN, M., 1983 Nucleotide polymorphism at the alcohol dehy- drogenase locus of Drosophila melanogester. Nature 3 0 4 412-417.

LETSOU, A,, K. ARORA, J. L. WRANA, K. SIMIN, V. TWOMBLY et al., 1995 Drosophila Dpp signaling is mediated by the punt gene product: a dual ligand-binding Type I1 receptor of the TGFP receptor family. Cell 8 0 899-908.

LEwohrm, R C., 1989 Infemng the number of evolutionary even6 from DNA coding sequence differences. Mol. Biol. Evol. 6: 15-32.

LJ, W.-H., C.-I. Wu and C.-C. Luo, 1985 A new method for estimat- ing synonymous and nonsynonymous rates of nucleotide substi- tution considering the relative likelihood of nucleotide and co- don usage. MoI. Biol. Evol. 2: 150-174.

LUDWIG, M., and M. KREITMAN, 1995 Evolutionary dynamics of the enhancer region of men- skipped in Drosophila. Mol. Biol. Evol. 12: 1002-1011.

MANU, J. R., L. MATHIES and M. Scorn, 1995 Regulation of a deca- pentapkgic midgut enhancer by homeotlc proteins. Development 120: 3605-3619.

MASSAGUE, J., 1990 The transforming growth factor$ family. Annu. Rev. Cell Bio. 6: 597-641.

MASSAGUE, J,, L. ATTISANO and J. WRANA, 1994 The TGF-0 family and its composite receptors. Trends Cell Biol. 4: 172-178.

NEWELD S. J., A. T. SCI-IMID and B. YEDVOBNICK, 1993 Homopoly- mer length variation in the Drosophila gene mastmind. J. Mol.

NEWELD. S. J., H. TACHIDA and B. YEDVOBNICK, 1994 Drive-selec- tion equilibrium: homopolymer evolution in the DrosophiIa gene martmind. J. Mol. Evol. 3 8 637-641

NEWFELD, S. J., and W. M. GELBART, 1995 Identification of two Dro- sophila TGF-p family members in the grasshopper Schistocerra amaicana. J. Mol. Evol. 41: 155-160.

NEWFELD, S. J., E. H. CHARTOFF, J. M. GRAFF, D. A. MELTON and W. M. GELBART, 1996 Mothers against dpp encodes a conserved cyto- plasmic protein required in DPP/TGF-P responsive cells. Devel- opment 122: 2099-2108.

PADGE~, R., R D. ST. JOHNSTON and W. M. GELBART, 1987 A transcript from a Drosophila pattern gene predicts a protein homologous to the transforming growth factor-0 family. Nature 325 81-84.

PADGEm, R., J. WOZNEY and W. M. GELBART, 1993 Human BMP sequences can confer normal dorsal-ventral patterning in the Drosophila embryo. Proc. Natl. Acad. Sci. USA 90: 2905-2909.

PANGANIBAN, G., R. REUTER, M. SColr and F. M. HOFFMANN, 1990 A Drosophila growth factor homolog, derapentaplegic, regulates homeotic gene expression within and between germ layers dur- ing midgut morphogenesis. Development 110: 1041-1050.

POOL.t, S. J., 1995 Conservation of complex expression domains of the pdm-2 POU domain gene between Drosophila uirilis and D. melanogaster. Mech. Dev. 49: 107-116.

POSAKONY, L. G., L. A. RAFTERY and W. M. GELBART, 1991 Wing for- mation in Drosophila melanogasterrequires decapentapkgic function along the anterior-posterior compartment boundary. Mech. Dev. 33: 69-82.

615-625.

62: 261-268.

EvoI. 37: 483-495.

PUSTELL, J., and F. C. WATOS, 1984 A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis and homology determination. Nucleic Acids Res. 12: 643-655.

The control of cell fate along the dorsal-ventral axis of the Dro- sophila embryo. Development 113: 35-54.

RILEY, M. A,, 1989 Nucleotide sequence of the Xdh region in Dro sophilu pseudoobscura and an analysis of the evolution of synony- mous codons. Mol. Biol. Evol. 6: 33-52.

RICHTER, B., M. LONG, R. C. LF.U’ONTIN afm E. NITASAKA, 1997 Nu- cleotide variation and conservation at the dpp locus, a gene con- trolling early development in Drosophila. Genetics 1 4 5 311- 323.

Russo, C., N. TAKEZAKI and M. NEI, 1995 Molecular phylogeny and divergence times of Drosophila speries. Mol. Biol. Evol. 12: 391- 404.

SAMBROOK, J.. E. FwrsCH and T. M~NIATIS, 1989 Molecular Cloning. A Laboratoq Munual. Cold Spring Harbor Laborato~y Press, Cold Spring Harbor, NY.

SAMPATH, T., K. ~ S H K . ~ , J. DOCTOR, R. TLICKER and F. M. HOFFMANN, 1993 Drosophila TGF-0 superfamily proteins induce endo- chondrial bone formation in mammals. Proc. Natl. Acad. Sci.

SANICOIA, M., J. SEKEISKY, S. ELSON and W. M. GELBART, 1995 Draw- ing a stripe in Drosophila imaginal disks: negative regulation of decapentapkgic and patched expression by agrailed. Genetics 139: 745-756.

SCHILNEGGER, M. P., and M. G. GRUTTER, 1992 An unusual feature revealed by the crystal structure at 2.2A resolution of human TGF-P2. Nature 358: 430-434.

SEGx. , D., and W. M. GELBART, 1985 Shortvein, a new component of the derapentapkgic gene complex in Drosophila naelanoguster. Genetics 109: 119-143.

SEXELSKY, J.J.. S.j. NEU‘FELI), L. A. RAFTERY, E. H. CHARTOFF and W. M. GEl.RART, 1995 Genetic characterization and cloning of Mothers against dpp, a gene required for decapvntaplegic function in Drorophila melanogaster. Genetics 139: 1347-1358.

SMITH, R. F., and T. F. SMITH, 1990 Automatic generation of pri- mary sequence patterns from set3 of related protein sequences. Proc. Natl. Acad. Sci. USA 87: 118-122.

ST. JOHNSTON, D., 1995 The intracellular localization of messenger RNAs. Cell 81:161-170.

ST. JOHNSTON, R. D., F. M. HOFFMANN, R. BWCILZ.IAN, D. SE<;AI., R. GRIMAIIA et al., 1990 Molecular organization of the decapen- taplegx gene in Drosophila melanogaster. Genes Dev. 4: 11 14- 11 27.

THUMMEL C., 1993 Compilation of Drosophila cDNA and genomic Iibral-ies. Dros. Info. Serv. 72: 180-183.

TWOMBIY, V., R. BIACKMAN, H. JIK,.J. GWF, R. PADGETT, rf ai., 1996 The TGF-fi signaling pathway is essential for Drosophila oogen- esis. Development 122: 1555-1365.

WELLS, R. S., 1995 Sequence and evolution of the Drosophila pseudo obscura glycerol-3-phosphate dehydrogenase locus. J. Mol. Evol. 41: 886-893.

WEI.I.S, R. S., 1996 Nucleotide variation at the C$dh locus in the genus Drosophila. Genetics 143: 375-384.

“HARTON, K, B. YEDVOBNICK, V. FINNERTY and S. ARTAVANISTSAKO- NAS, 1985 Opa: a novel family of transcribed repeats shared by the Notch locus and other developmentally regulated loci in D. melunogaster. Cell 40: 55-62.

WHARTON, K., G. THOMSEN and W. M. GELBART, 1991 Drosophila 60A gene, another transforming growth factor beta family mem- ber, is closely related to human bone morphogenetic proteins. Proc. Natl. Acad. Sci. USA 88: 9214-9218.

WHARTOIZ, li., R. P. UY, S. FINDLEY, H. DUNCAN and W. M. GELBART, 1996 MoIecuIar lesions associated with aIIeIes of derupentnpkgic identifv residues necessary for TGF+’/BMP cell signaling in Dro sophtla melanogaster. Genetics 142: 493-505.

WOZNEY, J . , V. ROSEN, A. CELESTE, L. MITSOCK, M. WHITTER~ et al. , 1988 Novel regulators of bone formation. molecular clones and actrvities. Science 242: 1528-1534

ZHOC, L., and G. L. BOULIANNE, 1994 Comparison of the neuralized genes of Drosophila uirilis and D. melanogaster. Genome 37: 840- 847.

RAY, R., K ARORA, C. NUSSI.EIX-VO~.HARD and W. GEIBART, 1991

USA 90: 6004-6008.

Communicating editor: A. G. CLARK