diversification ofthe wntgenefamilyonthe ancestral lineage ... · lineage on (p

5
Proc. Natl. Acad. Sci. USA Vol. 89, pp. 5098-5102, June 1992 Evolution Diversification of the Wnt gene family on the ancestral lineage of vertebrates (molecular evolution/developmental regulator genes/gene duplication/evolutionary rates/jawed vertebrates) AREND SIDOW Department of Molecular and Cell Biology, 401 Barker Hall, University of California, Berkeley, CA 94720 Communicated by Harold E. Varmus, March 11, 1992 ABSTRACT Diversification of the Wnt genes, a family of powerful developmental regulator molecules, is inferred by molecular evolutionary analyses. Fifty-five recently deter- mined partial sequences from a variety of vertebrates and invertebrates, together with 17 published sequences, mostly from the mouse and Drosophila melanogaster, are analyzed. Wnt-1 through -7 originated before the last common ancestor of arthropods and deuterostomes lived. Another round of gene duplication, involving Wnt-3, -5, -7, and -10, occurred after the echinoderm lineage arose, on the ancestral lineage of jawed vertebrates. Increased constraints were imposed on the Wnt genes when jawed vertebrates originated, as indicated by an overall 4-fold lower rate of amino acid replacements in jawed vertebrates compared with invertebrates and jawless verte- brates. The Wnt genes are thus inferred to have undergone a disproportionately high amount of structural and functional evolution in the relatively short time ('100 million years) between the origin of the echinoderm lineage and the first diversification of jawed vertebrates. A model is presented for the relationship of functional diversification of developmental regulators and their rates of amino acid replacement. The molecular characterization of metazoan development is beginning to provide a basis for identifying the forces that contribute to the evolution of organismal complexity. Diver- sification (i.e., duplication and/or regulatory evolution) of developmental control genes is likely to be an important factor in this process. The Wnt genes appear to fulfill three criteria that are necessary for addressing whether their evo- lution could have contributed to the evolution of increasingly complex animal development. (i) They are expressed and function during early embryonic development in a wide variety of animals, notably Drosophila (1-3), mouse (4-11), and Xenopus (12-16). (ii) At least one tissue in which many Wnt genes are expressed, the developing central nervous system, has increased dramatically in complexity several times during animal evolution. (iii) The coding sequences of the Wnt genes are amenable to molecular evolutionary anal- yses and provide enough data for statistical tests of specific evolutionary hypotheses. Fifty-five Wnt-related sequences gathered for this study (Fig. 1)* along with 17 full-length sequences from the mouse (6, 8, 9, 17) and other organisms (12, 18-21, 35; A. McMahon, personal communication) were analyzed with maximum- likelihood methods of phylogenetic inference (22-26). Wnt-9, -JOa, and -JOb represent additional homologs (this study and A. McMahon, personal communication), which increases the number of known Wnt genes to 14. The analyses presented here do not include Wnt-8 and -9 because only two sequences each were available from these two homologs. Diversifica- tion of the Wnt genes was addressed by mapping the dupli- cations that gave rise to the Wnt gene family onto a species tree of insects, echinoderms, and both jawless and jawed vertebrates and by analyzing the rates of amino acid replace- ment substitution. MATERIALS AND METHODS Sequence Determination. Small amounts of liver or muscle tissue were finely minced or ground up under liquid nitrogen. Total genomic DNA was isolated by overnight extraction at 370C in an EDTA/SDS/protease K buffer, extracted with phenol and chloroform, and spin-dialyzed (Centricon 30). Twenty-microliter PCRs contained 10-100 ng of genomic DNA, 10 mM Tris (pH 8.3 at 25TC), 10% (vol/vol) glycerol, bovine serum albumin at 1 mg/ml (Sigma), 50 mM KCI, 2 mM MgCl2, 0.8 mM total dNTP (Pharmacia), 1 ,uM 5'- and 3'-primers with 5'-phosphate moieties, and 1 unit of Taq polymerase (Perkin-Elmer/Cetus or Boehringer Mannheim). Primers were degenerate to match the following amino acid sequences: 5'-primers: CKCHGV, CKCHGI, KCKCHG, QECKCHG, ITCKCHG. The base at the 3'-end of these primers was on the second position of the corresponding codon. The 3'-primers were HWCC(F/Y)V, HWCC(A/V)V. The base at the 3'-end of these primers was on the first position of the histidine codon. After thermocycling in a Perkin-Elmer/Cetus cycler (with a 2-min denaturation step at 920C preceding 30 cycles of 40 s at 92TC, 1 min at 470C to 50TC, and 1 min at 720C), 1 unit of Klenow DNA polymerase (United States Biochemical) was added to the reaction, which was then incubated for 1 hr at 370C and separated in a 3% NuSieve (FMC) agarose gel containing ethidium bromide at 1 ,ug/ml. Products of the expected size [350-500 bp (base pairs)] as well as other prominent bands were isolated as gel slices. UV-exposure times were kept to an absolute minimum to avoid DNA damage. Individual bands were purified by using GeneClean (Bio 101, La Jolla, CA). Ligations into Sma I-cut, dephosphorylated M13 mplO (Amersham), preparation and transformation of DnD-competent Escherichia coli DH5af'IQ (BRL), isolation of transformants, and prepara- tion of single-stranded phage DNA were done according to standard protocols (27) with some minor modifications. Se- quencing was done by using the Sequenase kit (United States Biochemical) according to the manufacturer's instructions, except that all reaction volumes were half of those recom- mended. All sequences shown were determined indepen- dently at least twice, most often more than three times, to identify and eliminate errors introduced during amplification and cloning. Error rates were <1/1000. Approximately 1700 clones were sequenced, about half of which were derived from Wnt gene family members. Sequence Analysis. An initial alignment of Wnt sequences obtained with the program ALIGN (28) was refined by hand, and positions of uncertain alignment, particularly in and Abbreviations: Mya, million years ago; Myr, million years. *The sequences reported in this paper have been deposited in the GenBank data base (accession nos. M91250-M91308). 5098 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on May 12, 2021

Upload: others

Post on 21-Dec-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diversification ofthe Wntgenefamilyonthe ancestral lineage ... · lineage on (P

Proc. Natl. Acad. Sci. USAVol. 89, pp. 5098-5102, June 1992Evolution

Diversification of the Wnt gene family on the ancestral lineageof vertebrates

(molecular evolution/developmental regulator genes/gene duplication/evolutionary rates/jawed vertebrates)

AREND SIDOWDepartment of Molecular and Cell Biology, 401 Barker Hall, University of California, Berkeley, CA 94720

Communicated by Harold E. Varmus, March 11, 1992

ABSTRACT Diversification of the Wnt genes, a family ofpowerful developmental regulator molecules, is inferred bymolecular evolutionary analyses. Fifty-five recently deter-mined partial sequences from a variety of vertebrates andinvertebrates, together with 17 published sequences, mostlyfrom the mouse and Drosophila melanogaster, are analyzed.Wnt-1 through -7 originated before the last common ancestorof arthropods and deuterostomes lived. Another round of geneduplication, involving Wnt-3, -5, -7, and -10, occurred after theechinoderm lineage arose, on the ancestral lineage of jawedvertebrates. Increased constraints were imposed on the Wntgenes when jawed vertebrates originated, as indicated by anoverall 4-fold lower rate of amino acid replacements in jawedvertebrates compared with invertebrates and jawless verte-brates. The Wnt genes are thus inferred to have undergone adisproportionately high amount of structural and functionalevolution in the relatively short time ('100 million years)between the origin of the echinoderm lineage and the firstdiversification ofjawed vertebrates. A model is presented forthe relationship of functional diversification of developmentalregulators and their rates of amino acid replacement.

The molecular characterization of metazoan development isbeginning to provide a basis for identifying the forces thatcontribute to the evolution of organismal complexity. Diver-sification (i.e., duplication and/or regulatory evolution) ofdevelopmental control genes is likely to be an importantfactor in this process. The Wnt genes appear to fulfill threecriteria that are necessary for addressing whether their evo-lution could have contributed to the evolution of increasinglycomplex animal development. (i) They are expressed andfunction during early embryonic development in a widevariety of animals, notably Drosophila (1-3), mouse (4-11),and Xenopus (12-16). (ii) At least one tissue in which manyWnt genes are expressed, the developing central nervoussystem, has increased dramatically in complexity severaltimes during animal evolution. (iii) The coding sequences ofthe Wnt genes are amenable to molecular evolutionary anal-yses and provide enough data for statistical tests of specificevolutionary hypotheses.

Fifty-five Wnt-related sequences gathered for this study(Fig. 1)* along with 17 full-length sequences from the mouse(6, 8, 9, 17) and other organisms (12, 18-21, 35; A. McMahon,personal communication) were analyzed with maximum-likelihood methods of phylogenetic inference (22-26). Wnt-9,-JOa, and -JOb represent additional homologs (this study andA. McMahon, personal communication), which increases thenumber of known Wnt genes to 14. The analyses presentedhere do not include Wnt-8 and -9 because only two sequenceseach were available from these two homologs. Diversifica-tion of the Wnt genes was addressed by mapping the dupli-

cations that gave rise to the Wnt gene family onto a speciestree of insects, echinoderms, and both jawless and jawedvertebrates and by analyzing the rates of amino acid replace-ment substitution.

MATERIALS AND METHODSSequence Determination. Small amounts of liver or muscle

tissue were finely minced or ground up under liquid nitrogen.Total genomic DNA was isolated by overnight extraction at370C in an EDTA/SDS/protease K buffer, extracted withphenol and chloroform, and spin-dialyzed (Centricon 30).Twenty-microliter PCRs contained 10-100 ng of genomicDNA, 10 mM Tris (pH 8.3 at 25TC), 10% (vol/vol) glycerol,bovine serum albumin at 1 mg/ml (Sigma), 50mM KCI, 2mMMgCl2, 0.8 mM total dNTP (Pharmacia), 1 ,uM 5'- and3'-primers with 5'-phosphate moieties, and 1 unit of Taqpolymerase (Perkin-Elmer/Cetus or Boehringer Mannheim).Primers were degenerate to match the following amino acidsequences: 5'-primers: CKCHGV, CKCHGI, KCKCHG,QECKCHG, ITCKCHG. The base at the 3'-end of theseprimers was on the second position of the correspondingcodon. The 3'-primers were HWCC(F/Y)V, HWCC(A/V)V.The base at the 3'-end of these primers was on the firstposition of the histidine codon. After thermocycling in aPerkin-Elmer/Cetus cycler (with a 2-min denaturation stepat 920C preceding 30 cycles of 40 s at 92TC, 1 min at 470C to50TC, and 1 min at 720C), 1 unit of Klenow DNA polymerase(United States Biochemical) was added to the reaction, whichwas then incubated for 1 hr at 370C and separated in a 3%NuSieve (FMC) agarose gel containing ethidium bromide at1 ,ug/ml. Products of the expected size [350-500 bp (basepairs)] as well as other prominent bands were isolated as gelslices. UV-exposure times were kept to an absolute minimumto avoid DNA damage. Individual bands were purified byusing GeneClean (Bio 101, La Jolla, CA). Ligations into SmaI-cut, dephosphorylated M13 mplO (Amersham), preparationand transformation of DnD-competent Escherichia coliDH5af'IQ (BRL), isolation of transformants, and prepara-tion of single-stranded phage DNA were done according tostandard protocols (27) with some minor modifications. Se-quencing was done by using the Sequenase kit (United StatesBiochemical) according to the manufacturer's instructions,except that all reaction volumes were half of those recom-mended. All sequences shown were determined indepen-dently at least twice, most often more than three times, toidentify and eliminate errors introduced during amplificationand cloning. Error rates were <1/1000. Approximately 1700clones were sequenced, about half of which were derivedfrom Wnt gene family members.

Sequence Analysis. An initial alignment of Wnt sequencesobtained with the program ALIGN (28) was refined by hand,and positions of uncertain alignment, particularly in and

Abbreviations: Mya, million years ago; Myr, million years.*The sequences reported in this paper have been deposited in theGenBank data base (accession nos. M91250-M91308).

5098

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

May

12,

202

1

Page 2: Diversification ofthe Wntgenefamilyonthe ancestral lineage ... · lineage on (P

Evolution: Sidow Proc. Natl. Acad. Sci. USA 89 (1992) 5099

*GS..i....K.....F.T...F.K.........K....-.....N..HH...M.H......... KT.....F...T.........Y........W_ ...K.....F.T...F.K......I...K....-..-...V..HH...M.H.....V....:::::KT..V....S...........Y...N......W- ...K.....F.T...F.K......I...K....-..-...V..HH... N..H .............KT.....S...........Y....

Ax-i . ....A......F....F.K......I...K... ....-.. THH. ... NH.T.....0...H..... .KT. .S..S. .V. 10..L.......Y..K..H....1 X1-i ...- SL.....PF.S...A.K.....K.T.S.H ....V.- .SDPPH....HN.T.AL..SQ .......SP.MH..P..T.. I....T.LG.......Y.SLAEK..H...N.

S- ...I.....SF....F.K......V.A.K....-..H. .NPSN....H.S ....SR...z...I...H.KTD.H..S..T..........FO.Q.EK..H.....Zf-i.........SF.L...Y.K......V.A.K....-...H. DPRH.... M....L..SR .......S.U.KT..H..S..T...........YK..NEC ...H...*Sh-1i....K.....F.T..NL.KE.....I...K....-..-...DMRH.......H..Q..f..R...AAHSKV....T..H.T.LG....D.....F..L.E.....H...*5f-i . .T..IEE.....F.S. .EF.K......ALR.E.V---4.DHGTEANFO.YNSN.....G.R.....D..D..VRH1E.A. .L..V. .E. .KT.LGV....D.N. ... .DYDGSEV.IK...S...5u-i .. IE.....F.T. .ELIKE.....TNR.D. .---P.D.ETESSFV.YH.S. .Q.ASRt.H....D..ERNDKFJ.V.R..E. .AT.LGV. .. .D.N..S. .ST.TEIK.K ..S...Dm-i ... K....AHF.VI..VKA....T..OVT.--.L.-.AMHRYHFQ.M.HH1.E....GSK....L.P..S..EKNL.Q.IL..H..Q..ET.LGV...G.1M......Y.RDEVW.V...A...No0-2 SGSCTLRTCWLANADFRKTGEYLWRKYHGAIOV'YHQGTG--------FTVAHKRtFKKPTKNDLVYFENSPDYCIRDREAGSLGTAGRVCNLTSRGNDSCEVNCGRGYDTSNVTRNTKCECKF

2 HU-2 ...........D...................E.....0.................................G.*SU-2 ...SOY. .H..PKL.QIS.A.LKSHIO.YHWIYSKRSLK--------LRPLOE.HRM.S.T.I..L--T....EPHKRH....H..R..K..T.VNG.RL.......Q.N4LRHVES.H.R.No-3a SGSCEVKTCWW1SOPDFRTIGDFLKDKYDSASENVVEKHRESR-------GINETLftPRYTYFKVPTERDLVYYEASPNFCEPHPETGSFGTRDRTCVSNHGID0CDLLCCGRGHHNRTERRREKCHCVF*Tu-3a..........V...Y..................K.NF..A...K..H...M.........I...T..........T...K.K.I...%LI-3a ..........V...Y..................K.HF..A....H..W...........I...T.........T...K.K.I...*Sa-3a ..........V....................K..F..P......S...........I...T.........T ...K.K.1...*Sh3a ....W......V..Y.......I...........T.K..S..A.....I..DS....t...A.....H.K..IT ....E.....DT...K .....I.

3 No-3b....... A...A....................AK.AL..P....H...M............T.........T...K.K.*Tu-3b .......A...A...Y.................AK.AL..P.....W.......LH . ....1...........T ...K.K.I::.'.*Gs-3b .......A...A...Y.................AK.AL..P....H...W.............T.........T...K.K....IL*Sa-3b .......A...A...Y.......A...........A..AL..P....0............K....T.........T...K.K.I...*Sh-3b .......A...V...Y.................AK.EL..P.....H.......A......I...T.........T...K.K.I...HNf-3.......VA...S...H.........K.A.........K.PL..P..D...I...R.....D..S.....K..V..LT..........T...K.ID....

*Sf-3 ...... T.H.KEV..Rt.LV..OH.KRV.ARHNSIGGG----L.LANVPKKKRKRAPP.P.DQ..FL.D ....H .DG.1.1I...K..Y..RT.D.A.R..TN4....Y.IKL...T.W.Y.Q.No0-4 SGSCEVKTCWRAVPPFROVGHALKEKFDGATEVEMRVGSSR ------ALVPRNAQFKPNTDEDLVYLEPSPDFCEQDIRSGVLGTRGRTCNKTSKAIDGCELLCCGRGFHTAOVELAERCHCRF*S- ...F....K.N....K..HV.......OSEI..TK------V...K.S.......DS...0.DH.LKH....S.. .......N.....DE..WV...S.K.

*W-4 .......K.N ... .K.. .H.......Q.KT. .TK------V....K.S.......0.DS...DH.LKH...S. .H.N:......O.....DE..V... .S.K.4 *WS.4.......K.N....K..H1V.......O.Kl..TK.-----V...K.S........ODS....DH.LKH....S..H.....N...M....DE..W...S.K.

*S- .....K.N.....K..HV.......QKKI..AK.-----V....K.S........DS....DH.LKH....A..Q.....N...M....EE..IV...S.K.HNf-4 ..N.....4....K..HVI..... NO.I.TRK------Q...K.P........IS....IRt.W ..P..A..H..R ... L.1........EA..V ...S.K.

*SU.4 ...N....KSN.T.GOI.V.......OSLKI..RO..-----0...TORD....SS....V.....E.LKf.S...H..R.N..........HN.HIE.VI ...S.K.No-5e SGSCSLKTCWLQLADFRKVGDALKEKYDSAAANRLHSGK..------LVOVSRFNSPTT0OLVYIDPSPDYCVRNESTGSLGTQGRLCNKTSEG1IDGCELINCCGRGYDOFKTVOTERCHCKF*Sg.5a...................K...............A..IH...............................RTus-5a..................K.N..............A..IH......N.............R..............H....*Sh-5a..........H...M.....T.K..G...............T...L... .V.Q.................A........R.....No-5b ........E....R.......ITRO.---------ELA....Q..PE....V....L...T.................R..S..V.f...R

5 *Us-5b ...........L........ISRK.----------EL..N ....PE....V....L...T..................S..V....*Sa-5b.............H.......I.RK----------..EL.....T..VE....T. .I... N

... ..L................V....*Sh-5b ..........1..L......KITRK.---------EL.P...P.PD ... 1.0....K.K..N...M...............V....HNf-5 .......N..SP..E..HR..Q...Q...V..ARt.RR.--------EP..Q..SP..KN....LET.....N..DT..AA..A..Q.ERG.A.TG.......S.RATS.....HNf-5,., ........E.H...N...C. .G.V......GP.RT.--------ELR....L.P.SAD ... O.0..7.T.AR....V..A...A..........ATWV....

*Hf.5M ..........S..TQ...R..L.KVV.VTRKSR--------EPR.RKLGT..PT...HLES.....A..G.S....I..E.D.R.A.0.H.......VTV.Q..H ...

Dm-3 ....I...Q..SSI.EI..Y.R...EG.TKVKI.K..R --------- IKDLO.KV..AH..I.L.E...W.RHSYALHUP..H..V.H.H.S.LES.AIL .....H TKHIIVNH...HM...No-6 SGSCALSTCWOKLPPFftEVGARLLERFNGASRVNGTNDGKA..------LLPAVRTLKPPGRADLLYAADSPDFCAPHRRTGSPGTRGRACNSSAPDLSGCDLLCCGRGHRQESVOLEENCLCRF*Sa-6 ....T.R...K.N.H.0...H.....FK..AG....S--------VV.VGHHI...DKR..I.S .....1A..K..A...K..L...T.N.NG.......E.TNV.....

6 *Tf-6 ....T.ft...K.N.H. 0.H.D.......K...G....T---------I.VGOHI ...DKO..I.SDE.....LA..K...L....N...T.N.1I.....E..Y.E...VF.....*Sf-6 ....T.E. ...E.N.S....S..N.aYK.SAX.T.G...GT--.......I.EDS.V...SEL..V.SN......E..P..VHSAR..VG...TSN4.VG...I.....YVE.T.IF0V..R...

*Su.6 ....T.K...K.N.T..D..H..KY.D. .V.T.G.S.EH --------I.EDE.V.Q.TIK..V.SNE.H...E.D.KS..L. .E. .R...TSN.VG...IN.....YHEVLAEKR...R...

*S57a........a.. K ........0 ................. -.K.......N........HS.SM...OH........... .K.....*S-7h ......... N......K. I.........E.0 . TH . V .............V.....K.....I ...........S.NSS....HH....S.S ....No-Th.........K...V.HL..E...A..Q..V....LR----Q....R..QLR..Q..-.E.....K.....AA......L..R.S.G.D...T.......TK.....*ST-7b.........K...I..I..E...A..Q..V...LR----0.....QIK..Q..-.E.....K.....AS......L..R.S.H.D .M........TK.....

7 'WL-Th.........K...I..I..E...A..Q..V....LR----O..... GR..Q..-.E.....OK.....AS......L..Rt.SLS.D...N......7K.....*Sh-7b.........K...I..I..E..S....A...T.LR----Qa......RT.....-.V...t.......SS......L..R.SHHGO.........TK.....*Hf-7 ...A.... ..K...V..A....S..EYT....YR-....V. 557 .......M.-. ......S....R..L..R.S.G.A.........KKN....*Hf-71v ....A....R...P..DI.FA..R..HS.AL ....R.HP----.... L'VHVPF....-SP.E..FL.O .....S.AAA.....P..RSS.L.G..E.L......IAT0A.R.H...Sf-7 .........P..LV.NI..Q..EKP.Q ....TRJR....A...DIKNFK..-RtP..N..LOR.....DR..RV..L. .V..Q..R.SIGTDS.........T.1I....'SU-7..N..... .H.N..SV.D. .. .E..ERTL0.....K.K.TR--....V.DSEHJ,. .-RLSH .FLHR....F.EHH..N..V. .R..Rt.STSTDS........TKI....Din-2 .....VN .....KS. .P..LV.DR.NN. .OK.KT.OIA.KGK.GL----.LVLSRK.NAGirAQ..PKRPE .I.L.A....RSLQ. .. .Q..S..T.QR.GHGPQS...L....H..QHIR.TT..R.O.*So i0a SGSCQLKTCWOVTPEFRVWGHLLKERFYGATLIKPHHRHTG..------OLDHSIAIRRRTSINSLVYFEKSPDFCEREPOLDSTGTOGRICHKTSPW.HDcESLCCGRGHNILROTPSERCaCKF

iD *Sh-i0a .........D..T..T...HK..H...A...........VE.AA1..KS.AGE..F..E..MH...S..K...A..R.................RO.*5-10b....F.... H....L.SSV. .. .K.GR..F.HS. .K.-----------VFHPRRLKKKRLAKE ........DTKV. .P....V.H..O..S ........N....::.R"*Hf-10....... S.D. .S. .DT.R.K.OS.LFLPL. .GHG.-------IGGLLV----PRDTQ. ft..R.T .. .Q.DOIG.P. .t. .L.ER.EQ.FSG.S.N4.....W.E.RV...Consrvd SGOC CM C C C cc R CC

FIG. 1. Alignment of sequences determined specifically for this study (*) with previously identified sequences. Predicted amino acidsequences are shown in the single-letter code. Dots denote identities to the first sequence of a given paralog; hyphens indicate gaps in alignment.-This region is homologous to most of the coding region ofexon 4 of mouse Wnt-1. Overlined sections were used for analyzing evolutionary rates(Table 1). Species: Mo, mouse (Mus domesticus); Hu, human (Homo sapiens); Gs, gopher snake (Pituophis melanoleucus); Ws, western skink(Eumesces skitonianus); WI, western fence lizard (Sceloporus occidentalis); Tu, turkey (Meleagris gallopavo); Sg, snow goose (Chencaerulescens); ST, amino acid sequences identical for snow goose and turkey; Ax, axolotl (Ambystoma mexicanum); Xe, Xenopus laevis; Sa,salamander (Plethodon jordani); Zf, zebraflish (Brachydanio rerio); Tf, tunafish (Thunnus sp.); Sm, Pacific salmon (Oncorhynchus sp.); Sh,thresher shark (Alopius vulpinus); Hf, hagfish (Eptatretus stouti); Su, sea urchin (Strongylocentrotus purpuratus); Sf, starfish (Evasteriastroschelii); Din, Drosophila melanogaster. Two orthologs each of Wnt-8 and -9 were also isolated but are irrelevant here and are therefore notshown. For simplicity, insertions in single sequences [for example, the large insertion in Din-i (wingless)] are not shown, but their locationsare identified with arrowheads. The affected sequences are Din-i, Sf-, Hf-711, Dm-2, Sa-10a, and Sh-l0a. Su-4 and Hf-4 contain introns thatwere removed on the basis of consensus splice junctions and this alignment (arrowheads).

around gaps, were omitted from any further analysis. Nu- (23-25) appropriate for sequence analyses of this kind (26).cleotide sequences were aligned with the program MAKEINFt Option U (paired-sites test of user-speciflied trees) was in(unpublished work) by using the amino acid alignment as a effect when trees were tested for statistical significance, andtemplate. Silent substitutions in first positions ofcodons were option L (user-speciflied branch lengths) was in effect for testseliminated by converting first positions of arginine codons of evolutionary rates. Options W (exclude third positions)fromadeinerctosie t M ad tose f lucin coons and T (transition/transversion ratio = 1.5) were always infrom tm o i. e a p o effect. Tne mean base frequencies in frst and second pos1-from thymine or cytosine to Y. The actual phylogenetic tions of codons were calculated by using MAKEINF (unpub-

analysis was performed on the first and second positions of lished work), with two-thirds ofall Y's (leucine, first position)codons by using the DNAML (DNA maximum-likelihood) and M's (arginine, first position) being counted as cytosineprogram in PHYLIP 3.3. (22). DNAML implements a six- and each remaining third being counted as thymine or ade-parameter stochastic model of DNA sequence evolution nine, respectively. The trees for Wnt-3 and Wnt-7 were

rooted with Wnt-1, that for Wnt-S was rooted with Wnt-4, andtProgram MAKEINF is available from the author by sending e-mail to that for Wnt-10 was rooted with Wnt-2. The choice of [email protected]. gene to use for placement of a root was determined by the

Dow

nloa

ded

by g

uest

on

May

12,

202

1

Page 3: Diversification ofthe Wntgenefamilyonthe ancestral lineage ... · lineage on (P

Proc. Natl. Acad. Sci. USA 89 (1992)

alignment in the length-variable central region (Fig. 1) and byconsiderations of minimizing computer time. The number ofcodons used was 287 for the analyses involving all full-lengthsequences and between 108 and 120 for the analyses address-ing the duplications of Wnt-3, -5, -7, and -10.The statistical tests for differences in evolutionary rates

were done as follows. The number of substitutions along eachbranch (branch lengths) of the best tree was estimated byDNAML. For each set of orthologous sequences from oneparticular Wnt gene, the best tree was tested against treeswith the same branching order, but in which the overall rateof evolution in jawed vertebrates was forced to be the sameas that in the hagfish and invertebrates. Rates were forced tobe equal by multiplying lengths of the vertebrate branches ofthe best tree with a and multiplying those of the invertebrateand hagfish branches with b, where a > 1 > b (J. Felsenstein,personal communication). Seven to 10 such trees were cal-culated and tested for each paralog to ensure that values fora and b had been found that gave the highest likelihood. Forcalculating the elapsed amount of evolutionary time, lineageswere assumed to have originated at the following times(29-31): arthropods (Drosophila), 600 million years ago(Mya); echinoderms, 555 Mya; hagfish, 510 Mya; cartilagi-nous fishes, 440 Mya; bony fishes, 420 Mya; amphibians, 365Mya; mammals, 300 Mya; birds and reptiles, 255 Mya.Numbers of substitutions and of years were added up alongall relevant lineages. Branches between the duplications thatgave rise to the a/b paralogs and the divergence of the sharkwere not included. If more than one representative of eachmajor branch was present (e.g., both starfish and sea urchinfor echinoderm Wnt-7), branch lengths were averaged andcounted only once.

RESULTS AND DISCUSSIONEarly Gene Duplications. Wnt-J has previously been sug-

gested to be orthologous (related by organismal phylogeny; asopposed to paralogous, related by gene duplication) to Droso-phila's wingless gene on the basis of conserved cysteines (8).Two more Wnt-related genes have been isolated from Dro-sophila (sequences for Dm-2 and Dm-3; ref. 35; A. McMahon,personal communication), which appeared closely related tomouse Wnt-Sa/Sb and Wnt-7a/7b in initial sequence align-ments. Because the duplications of Wnt-S and Wnt-7 that gaverise to Wnt-Sa/Sb and -7a/7b occurred after the Drosophilalineage originated (see below), there are only three possibleunrooted trees for relating Dm-2, Dm-3, Wnt-S, and Wnt-7.One ofthese three trees (Fig. 2a) is significantly better than theother two in paired-sites tests (P < 0.01). To ascertain whetherWnt-2, -3, 4, and -6 originated after or before divergence oftheDrosophila lineage, I individually tested their positions on thistree. In all cases, the genes fall on the central branch (arrowin Fig. 2a). Al but 1 of 16 alternative assignments can be ruledout at statistical significance (P < 0.05), which indicates thatDrosophila sequences Dm-2 and Dm-3 are orthologous tomouse Wnt-7a and b, and Wnt-Sa and b, respectively. Unlessgenes were lost during arthropod evolution, orthologs ofWnt-2, -3, 4, and -6 should therefore also be present in extantarthropods like Drosophila.The inferred relationship of the Drosophila sequences with

their orthologs from the mouse is further corroborated by amaximum-likelihood search for the best tree, in which all 17available full-length sequences were used in a single analysis(Fig. 2b; the tree was rooted by the midpoint method forclarity only). The topology of gene duplications before thedivergence of the lineages leading to vertebrates and arthro-pods predicts that Wnt-J through -7 were all present in the lastcommon ancestor ofarthropods and vertebrates. Because theprecise arrangement of these six duplications is not statisti-cally significant in paired-sites tests, it is possible that Wnt-2,-3, 4, or -6 shared common ancestors on the lineage leading

a Wnt-7a/b lWnt-5a/bWnt-1, -2,>-L-< -3a/b 4 -6

Dm-2 Dm-3

b

wg 1

a

'aI7b'

/ 5'a5Db I6 Dm-3 Dm-2

FIG. 2. (a) Drosophila Dm-2 is orthologous to mouse Wnt-7a/b,and Drosophila Dm-3 is orthologous to mouse Wnt-Sa/b. MouseWnt-1, -2, -3a/b, 4, and -6 fall onto the central branch of this tree(arrow) at significance levels of P < 0.05 in paired-sites tests. (b)Maximum-likelihood tree for all 17 available full-length Wnt se-quences. One set of gene duplications (K) occurred before the lastcommon ancestor of vertebrates and arthropods (e) lived and an-other set of gene duplications occurred thereafter (*). Branchlengths are proportional to the estimated number of substitutions inthe vertical direction. Rooting was done by the midpoint method forclarity. Numbers on terminal branches correspond to the availableWnt sequences from vertebrates, mostly from the mouse. wg,wingless gene of Drosophila.

to vertebrates. It should be noted, however, that the tree inFig. 2b is also the single most parsimonious, as determinedfrom an exhaustive search of all 945 possible bifurcatingarrangements among the seven Wnt genes. The initial diver-sification of the Wnt gene family (Fig. 2b, open diamonds),therefore, happened >600 Mya, before arthropod and deu-terostome lineages diverged.Gene Duplications of Wnt-3, -5, -7, and -10. To determine

when in animal phylogeny the gene duplications that gave riseto the a/b paralogous genes (Fig. 2b, black diamonds)occurred, it was first necessary to obtain Wnt-related se-quences from a number of vertebrate species and two echin-oderms (Fig. 1). For consistency with Wnt-5 and -7, I refer tomouse Wnt-3 (7, 9, 32) as Wnt-3b, to distinguish it from Wnt-3in those organisms whose lineages split off before duplicationof Wnt-3 into two homologs. For Wnt-3a, -3b, -Sa, -Sb, -7a,-7b, and -JOa, orthologous sequences were isolated from thethresher shark, indicating that these genes duplicated beforejawed vertebrates first diversified 400-450 Mya. Orthologoussequences of Wnt-3 and -7 were found in echinoderms.Paired-sites tests indicate at statistical significance (P < 0.05)that the gene duplications that gave rise to Wnt-3a and -3b andto Wnt-7a and -7b occurred on the vertebrate lineage afterechinoderms had diverged (Fig. 3). Further resolution, al-though not at statistical significance, is provided by orthol-ogous sequences from the hagfish, a primitive jawless ver-tebrate: in the Wnt-3-tree with the highest likelihood, theduplication is placed after divergence of the lineages leadingtojawed and jawless vertebrates (see Fig. 3). Two sequencesclosely related to Wnt-7 were also isolated from the hagfish(Fig. 1). Because the trees relating these sequences to Wnt-7are very similar in their likelihoods, their precise relation-ships with Wnt-7a and -7b could not be determined.As for Wnt-3 and -7, sequences from the shark provide a

lower (most recent) limit for the duplication of Wnt-S. Theupper limit was provided by the Drosophila sequence Dm-3because no orthologs of Wnt-S could be isolated from echin-oderms. Surprisingly, though, three sequences closely re-lated to Wnt-5 were found in the hagfish (Fig. 1). An initialanalysis of unrooted trees with paired-sites tests grouped the

5100 Evolution: Sidow

Dow

nloa

ded

by g

uest

on

May

12,

202

1

Page 4: Diversification ofthe Wntgenefamilyonthe ancestral lineage ... · lineage on (P

Proc. Natl. Acad. Sci. USA 89 (1992) 5101

600 a/b paralogs

I51-10_440 I I I *X*,< increased

constraints420

365

255~~~~~~~j(7)

maDroso- Echino- Hag- Cf Bf A M R Bmaphila derms fish I

Jawed Vertebrates

FIG. 3. Wnt-3, -5, -7, and -JO duplicated on the ancestral lineageof jawed vertebrates (dashed branch), on which also increasedconstraints were imposed on evolution of the Wnt genes. The onlyother gene duplications detected within this phylogeny were twoindependent duplications of Wnt-5 and, possibly, one independentduplication of Wnt-7 on the hagfish lineage. Branch lengths in thevertical direction are proportional to time. Dates when lineagesoriginated are given at left in Mya. Cf, cartilaginous fishes; Bf, bonyfishes; A, amphibians; M, mammals; R, reptiles; B, birds.

three hagfish sequences together at statistical significance (P< 0.05) to the exclusion of Wnt-Sa and -Sb. Thus, none of thehagfish sequences are orthologous to either Wnt-Sa or -5b.When Wnt4 and Dm-3 were included as outgroups, the rootfell on the branch that separates the hagfish sequences andthe Wnt-5a/b group. This result suggests that Wnt-5 dupli-cated once on the ancestral lineage ofjawed vertebrates andtwice on the hagfish lineage (Fig. 3).

Wnt-10 is a paralogjust isolated in this study. In the best treefor the Wnt-10-related sequences, the root falls on the hagfishlineage, such that the duplication into Wnt-JOa and -lOb alsooccurs on the ancestral lineage ofjawed vertebrates (Fig. 3).

Evolutionary Rates of the Wnt Genes. Analysis of evolution-ary rates for the sequences in Fig. 1 shows that the Wnt geneshave experienced a 4-fold overall slowdown in the rate ofamino acid replacement substitution since the origin ofjawedvertebrates (Table 1). Importantly, in the available full-lengthsequences, the segment used in this study (Fig. 1) has evolvedat approximately the same rate as all 287 unambiguouslyalignable codons of the entire coding region. The paralogs

range in their deceleration from =2-fold to 15-fold (Table 1).These rate differences are significant in paired-sites tests for allparalogs except Wnt-6. The times of divergence used forestimating the evolutionary rates are partly based on currentknowledge of the fossil record (29, 30) and on previousmolecular evolutionary analyses (31). For vertebrates, themost recent likely dates of divergence were used (see Mate-rials and Methods), so that estimates of evolutionary rateswould not be artifactually biased toward low values.The significant differences in rates of amino acid replace-

ment substitution are not due to slowdowns or accelerationsin the overall rate of molecular evolution in specific orga-nisms. Calmodulin, for which sequences are available fromDrosophila and several vertebrates,* evolves at a rate similarto that of Wnt-3b in jawed vertebrates. Calmodulin exhibitsno significant difference in the rate of replacement substitu-tion in jawed vertebrates (0.0090 substitutions per Myr per298 sites) versus invertebrates (0.0119 substitutions per Myrper 298 sites). First and second positions ofcodons in anothergene, glutamine synthetase, have previously been shown toevolve at the same rate in Drosophila as in mammals (31).Gene conversions, which would reduce differences amongthe Wnt genes, have not occurred: in comparisons of allparalogs from any given species, third positions ofcodons aresaturated with substitutions, and the individual genes haveretained their differences at the amino acid level throughoutthe phylogeny (see Fig. 1). The lower rates of amino acidreplacement substitution are thus best explained by addi-tional or more stringent evolutionary constraints on the Wntgenes in jawed vertebrates (Fig. 3).A Model for Rate Variation in the Evolution of Develop-

mental Regulators. Two scenarios can explain a systematicslowdown in the evolutionary rates ofamino acid substitutionin genes that control development (Fig. 4). Both of them arebased on the finding that a slightly deleterious mutation canstill be fixed in a population when the selection coefficient issmaller than the inverse of the effective population size (33).Such a mutation is termed "effectively neutral." Given theslow rate of evolution of the Wnt genes (Table 1), as com-pared with the neutral substitution rate of about one substi-tution per site per 100 Myr, this is probably the circumstanceunder which the few amino acid replacements are fixed.

tGenBank accession numbers for calmodulin sequences are Y00133(Drosophila), M36167 (chicken), M36178 (electric eel), M65156(Xenopus), and J04046 (human).

Table 1. Rates of evolution in Wnt genesLevel of

Jawed vertebrates Jawless vertebrates and invertebrates significanceGene Substitutions Time, Myr Rate, s/Myr Substitutions Time, Myr Rate, s/Myr P

Wnt-I 84 1,965 0.043 112 1315 0.085 <0.05Wnt-3a 36 1,800 0.020 <0.003Wnt-3 132 1180 0.112Wnt-3b 13 1,800 0.007 <0.003Wnt-4 35 1,430 0.025 71 1110 0.064 <0.05Wnt-Sa 19 1,435 0.013 <0.003Wnt-S 145 1270 0.114Wnt-Sb 31 1,545 0.020 <0.003Wnt-6 61 1,205 0.051 68 690 0.098 NSWnt-7a 29 1,545 0.019 <0.003Wnt-7 191 1825 0.105Wnt-7b 31 1,435 0.022 <0.003

Total 339 14,160 0.024 719 7390 0.097 ND

Evolutionary rates computed for the Wnt gene segments in Fig. 1; only overlined codons are used to ensure compatibilitybetween comparisons. The rate per 232 first and second positions of codons was calculated by dividing the total numberof replacement substitutions (s) in first plus second positions of codons by the total amount of evolutionary time in millionsof years (Myr). Wnt-2 and Wnt-10 were not analyzed because sequences of crucial taxa were not available for either. TheP values at which rate differences are significant are indicated at right. NS, not significant; ND, not determined.

Evolution: Sidow

Dow

nloa

ded

by g

uest

on

May

12,

202

1

Page 5: Diversification ofthe Wntgenefamilyonthe ancestral lineage ... · lineage on (P

Proc. Natl. Acad. Sci. USA 89 (1992)

Amino acid Rats ofreplacement evolution

effectivelyneutral

deleterious

deleterious

high

low

low

FIG. 4. Model of how increased constraints may be imposed ondevelopmental regulators, thereby slowing their evolutionary rate.The Wnt gene itselfmay functionally diversify such that it is involvedin the developmental pathways of more traits than before (center) orgenes acting downstream of the Wnt gene in the developmentalpathway may diversify so that the trait becomes more complex(bottom). In either case, because the deleterious effect of a mutationis amplified, a smaller fraction of slightly deleterious replacementsubstitutions in the Wnt gene is selectively neutral after the diversi-fication. A lower evolutionary rate of amino acid substitution results.

In a given Wnt gene (or any developmental regulator), thesame amino acid replacement may be effectively neutral fora species in which it affects only one trait (Fig. 4 Top) but maybe efficiently selected against in a species in which themutation affects several traits (Fig. 4 Middle). As a conse-quence, in the species in which the gene is necessary for thedevelopment of several traits, a smaller fraction of all pos-sible amino acid replacements is effectively neutral and fewerwill get fixed over the same amount of evolutionary time.Similarly, diversification of genes acting downstream in thedevelopmental pathway of the Wnt gene, like those encodingthe still unidentified receptors for the Wnt gene products andtheir signal-transduction pathways, can increase the com-plexity of the trait (Fig. 4 Bottom). A mutation in the Wntgene that would have been effectively neutral in a species inwhich the trait is simple has an amplified deleterious effect inthe species where the trait is more complex. Again, a smallerfraction of all possible amino acid replacements in the Wntgene is effectively neutral, and fewer will get fixed over thesame amount of evolutionary time.For the Wnt genes, both scenarios are probably true. In the

mouse, several Wnt genes are expressed in the embryonicanlagen for tissues that arose or dramatically increased incomplexity during the origin of vertebrates. At least oneparalog, Wnt-Sa, is expressed in the forelimb bud and in cellsderived from the neural crest (8). A major site of Wnt geneexpression and function is the developing central nervoussystem (8-11). Importantly, despite some overlap, Wnt-3a and3b, -5a and -5b, and -7a and -7b each have unique domains ofexpression (refs. 8 and 9; A. McMahon, personal communi-cation), suggesting that these paralogs took up distinct func-tions after duplication. These patterns ofgene expression andthe molecular evolutionary characteristics described in thisstudy imply that the diversification of the Wnt gene familyfacilitated generation of new cell types and developmentalpathways that partly define the vertebrate body plan.

Structural and functional diversification of many othertypes of genes, such as cell adhesion molecules or transcrip-tion factors, must have also been necessary in the earlyevolution of the vertebrate body plan. For example, it islikely that the antennapedia/bithorax-Hox gene clusters du-plicated during the same period of early vertebrate evolution(34), although this has not yet been conclusively established.

It is not unreasonable to predict that the molecular evolu-tionary analysis ofsuch developmental regulatorgenes, alongwith comparative studies of expression patterns, will be asfruitful for evolutionary biology as the functional character-ization of these genes has been for developmental biology.

This work is dedicated to the late Allan C. Wilson, in whoselaboratory this study was performed. I thank D. Banfield, D. Foltz,Frank's Fish Market, J. Kornegay, T. Quinn, B. Stickle, W. K.Thomas, and D. Wake for tissue or DNA preparations; D. Irwin, I.Matsumura, and S. P~ibo for practical advice; A. McMahon and R.Nusse for unpublished Drosophila sequences, comments, and discus-sion; S. Edwards, C. Goodman, M. Hosobuchi, T. Kocher, E. Prager,W. K. Thomas, H. Varmus, and D. Wake for comments and discus-sion; J. Felsenstein, T. Nguyen, and T. Speed for statistical advice;and the late Allan C. Wilson for irreplaceable support and advice. Thiswork was sponsored by a National Institutes of Health grant to A. C.Wilson and an Abraham Rosenberg Research Fellowship to A.S.

1. Baker, N. E. (1987) EMBO J. 6, 1765-1773.2. Patel, N. H., Schafer, B., Goodman, C. S. & Holmgren, R.

(1989) Genes Dev. 3, 890-904.3. van den Heuvel, M., Nusse, R., Johnston, P. & Lawrence,

P. A. (1989) Cell 59, 739-749.4. Wilkinson, D. G., Bailes, J. A. & McMahon, A. P. (1987) Cell

50, 79-88.5. Shackleford, G. M. & Varmus, H. E. (1987) Cell 50, 89-95.6. McMahon, J. A. & McMahon, A. P. (1989) Development 107,

643-650.7. Roelink, H., Wagenaar, E., da Silva, S. L. & Nusse, R. (1990)

Proc. Natl. Acad. Sci. USA 87, 4519-4523.8. Gavin, B. J., McMahon, J. A. & McMahon, A. P. (1990) Genes

Dev. 4, 2319-2332.9. Roelink, H. & Nusse, R. (1991) Genes Dev. 5, 381-388.

10. Thomas, K. R. & Capecchi, M. R. (1990) Nature (London) 346,847-850.

11. McMahon, A. P. & Bradley, A. (1990) Cell 62, 1073-1085.12. Noordermeer, J., Meijlink, F., Verrizer, P., Rijsewjk, F. &

Destr6e, 0. (1989) Nucleic Acids Res. 17, 11-18.13. McMahon, A. P. & Moon, R. T. (1989) Cell 58, 1075-1084.14. Christian, J. L., Gavin, B. J., McMahon, A. P. & Moon, R. T.

(1991) Dev. Biol. 143, 230-234.15. Sokol, S., Christian, J. L., Moon, R. T. & Melton, D. A. (1991)

Cell 67, 741-752.16. Smith, W. C. & Harland, R. M. (1991) Cell 67, 753-765.17. van Ooyen, A. & Nusse, R. (1984) Cell 39, 233-240.18. Wainwright, B. J., Scambler, P. J., Stanier, P., Watson, E. K.,

Bell, G., Wicking, C., Estivill, X., Courtney, M., Boue, A.,Perdersen, P. S., Williamson, R. & Farrall, M. (1988)EMBOJ.7, 1743-1748.

19. Busse, U., Guay, J. & S6guin, C. (1991) Nucleic Acids Res. 19,981.

20. Molven, A., Nj0lstad, P. R. & Fjose, A. (1991) EMBO J. 10,799-807.

21. Rjsewijk, F., Schuermann, M., Wagenaar, E., Parren, P.,Weigel, D. & Nusse, R. (1987) Cell 50, 649-657.

22. Felsenstein, J. (1989) Cladistics 5, 164-166.23. Felsenstein, J. (1981) J. Mol. Evol. 17, 368-376.24. Hasegawa, M., Kishino, H. & Yano, T. (1985) J. Mol. Evol. 22,

160-174.25. Kishino, H. & Hasegawa, M. (1989) J. Mol. Evol. 29, 170-179.26. Nguyen, T. (1991) Ph.D. thesis (Univ. of California, Berkeley).27. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) Molecular

Cloning:A Laboratory Manual (Cold Spring Harbor Lab., ColdSpring Harbor, NY), 2nd ed.

28. Hein, J. (1989) Mol. Biol. Evol. 6, 649-668.29. Carroll, R. L. (1988) Vertebrate Paleontology and Evolution

(Freeman, New York).30. Benton, M. J. (1990) J. Mol. Evol. 30, 409-424.31. Pesole, G., Bozzetti, M. P., Lanave, C., Preparata, G. &

Saccone, C. (1991) Proc. Natl. Acad. Sci. USA 88, 522-526.32. Nusse, R., Brown, A., Papkoff, J., Scambler, P., Shakleford,

G., McMahon, A., Moon, R. & Varmus, H. (1991) Cell 64, 231.33. Ohta, T. (1973) Nature (London) 246, %-98.34. Duboule, D. & Doll6, P. (1989) EMBO J. 8, 1497-1505.35. Russell, J. & Nusse, R. (1992) Development 115, in press.

Wat function

Wnt-X downstream ragenes

downstream traWnt- X trait 2

genes trait3

Wnt.X traitmgenes tx t

5102 Evolution: Sidow

Dow

nloa

ded

by g

uest

on

May

12,

202

1