new gene evolution: little did we...

27
GE47CH15-Long ARI 31 August 2013 8:56 R E V I E W S I N A D V A N C E New Gene Evolution: Little Did We Know Manyuan Long, 1, 2, Nicholas W. VanKuren, 1, 2 Sidi Chen, 3 and Maria D. Vibranovski 4 1 Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois 60637; email: [email protected] 2 Committee on Genetics, Genomics, and Systems Biology, The University of Chicago, Chicago, Illinois 60637; email: [email protected] 3 Department of Biology and the Koch Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; email: [email protected] 4 Departamento de Gen´ etica e Biologia Evolutiva, Instituto de Biociˆ encias, Universidade de ao Paulo, S˜ ao Paulo, Brazil 05508; email: [email protected] Annu. Rev. Genet. 2013. 47:325–51 The Annual Review of Genetics is online at genet.annualreviews.org This article’s doi: 10.1146/annurev-genet-111212-133301 Copyright c 2013 by Annual Reviews. All rights reserved Corresponding author Keywords evolutionary patterns, evolutionary rates, phenotypic evolution, brain evolution, sex dimorphism, gene networks Abstract Genes are perpetually added to and deleted from genomes during evolution. Thus, it is important to understand how new genes are formed and how they evolve to be critical components of the genetic systems that determine the biological diversity of life. Two decades of effort have shed light on the process of new gene origination and have contributed to an emerging comprehensive picture of how new genes are added to genomes, ranging from the mechanisms that generate new gene structures to the presence of new genes in different organisms to the rates and patterns of new gene origination and the roles of new genes in phenotypic evolution. We review each of these aspects of new gene evolution, summarizing the main evidence for the origination and importance of new genes in evolution. We highlight findings showing that new genes rapidly change existing genetic systems that govern various molecular, cellular, and phenotypic functions. 325 Review in Advance first posted online on September 13, 2013. (Changes may still occur before final publication online and in print.) Changes may still occur before final publication online and in print Annu. Rev. Genet. 2013.47. Downloaded from www.annualreviews.org by University of Chicago Libraries on 11/14/13. For personal use only.

Upload: others

Post on 18-Feb-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • GE47CH15-Long ARI 31 August 2013 8:56

    RE V

    I E W

    S

    IN

    AD V A

    NC

    E

    New Gene Evolution: LittleDid We KnowManyuan Long,1,2,∗ Nicholas W. VanKuren,1,2

    Sidi Chen,3 and Maria D. Vibranovski41Department of Ecology and Evolution, The University of Chicago, Chicago,Illinois 60637; email: [email protected] on Genetics, Genomics, and Systems Biology, The University of Chicago,Chicago, Illinois 60637; email: [email protected] of Biology and the Koch Institute, Massachusetts Institute of Technology,Cambridge, Massachusetts 02139; email: [email protected] de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade deSão Paulo, São Paulo, Brazil 05508; email: [email protected]

    Annu. Rev. Genet. 2013. 47:325–51

    The Annual Review of Genetics is online atgenet.annualreviews.org

    This article’s doi:10.1146/annurev-genet-111212-133301

    Copyright c© 2013 by Annual Reviews.All rights reserved

    ∗Corresponding author

    Keywords

    evolutionary patterns, evolutionary rates, phenotypic evolution, brainevolution, sex dimorphism, gene networks

    Abstract

    Genes are perpetually added to and deleted from genomes duringevolution. Thus, it is important to understand how new genes areformed and how they evolve to be critical components of the geneticsystems that determine the biological diversity of life. Two decades ofeffort have shed light on the process of new gene origination and havecontributed to an emerging comprehensive picture of how new genesare added to genomes, ranging from the mechanisms that generate newgene structures to the presence of new genes in different organismsto the rates and patterns of new gene origination and the roles of newgenes in phenotypic evolution. We review each of these aspects of newgene evolution, summarizing the main evidence for the origination andimportance of new genes in evolution. We highlight findings showingthat new genes rapidly change existing genetic systems that governvarious molecular, cellular, and phenotypic functions.

    325

    Review in Advance first posted online on September 13, 2013. (Changes may still occur before final publication online and in print.)

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    BACKGROUND ANDHISTORICAL OVERVIEW

    Understanding how genes originate andsubsequently evolve is crucial to explaining thegenetic basis for the origin and evolution ofnovel phenotypes and, ultimately, biologicaldiversity. Gene origination is thus a widelyinteresting, yet difficult, problem to study.Perhaps unsurprisingly, the peculiar structures,functions, and evolution of evolutionarily newgenes have attracted the interests of pioneers ingenetics and evolution since the early twentiethcentury. Sturtevant (129) was one of the first toidentify a duplicated gene, the Bar duplicationin Drosophila melanogaster, from which Muller(103) developed the first prevalent model ofnew gene evolution in 1936. Muller (103,p. 529) predicted that a new duplicate copyof a gene could acquire a novel function andbe preserved in the genome, and further that“there remains no reason to doubt the appli-cation of the dictum ‘all life from pre-existinglife’ and ‘every cell from a pre-existing cell’to the gene: ‘every gene from a pre-existinggene.’” This early thinking on single-geneand whole-chromosome duplications (55) wasgreatly expanded in the 1970s. Ohno (112)further developed Muller’s model in 1970, andGilbert (52) proposed an entirely new modelof new gene formation in 1978, whereby piecesof unrelated genes can be recombined into newgenes rather than just be strictly duplicated.However, experimental work on new genes didnot begin until the early 1990s when a plausibleframework for experimental studies of new geneformation and evolution was proposed: studiesmust focus on genes that were recently formedbecause young genes still carry all the signa-tures of the evolutionary forces that shapedtheir origination and the evolution of their newstructures and functions (83). As genes age, theyaccumulate mutations that obscure the struc-tural or evolutionary signals from their earlyhistory (53, 79). Genes younger than 10–30 mil-lion years have not experienced much sequenceevolution and thus constitute a valid system inwhich to investigate the evolution of new genes

    and to understand their properties. This ideawas first manifested in the discovery of jingwei,a three million-year-old gene in two species ofAfrican Drosophila (85). Jingwei revealed severalinteresting features of new gene evolution thatare now known to be general: (a) recombina-tion of existing genes, leading to a hybrid genestructure; (b) rapid sequence evolution drivenby positive selection; and (c) acquisition of newbiochemical functions (150, 162).

    Today, it is clear that new gene origination isa general process in evolution and that species-specific or lineage-specific genes exist in many,if not all, organisms. Gigantic databases of ge-nomic sequences from thousands of species re-veal that genomes contain huge numbers anda large diversity of protein-coding genes. Forexample, the plant Glycine max genome en-codes more than 50,000 protein-coding genes,whereas the bacterial genome of CandidatusHodgkinia cicadicola contains only 189 genes.In addition, the abundance and diversity ofnon-protein-coding genes is only now begin-ning to be realized. Even genomes with similargene numbers can have very different, unrelatedgenes. These recent data reveal a widespreadprocess of birth and death of genes in organ-isms in which new genes enter the genome andold genes are lost. What mechanisms and forcesdictate gene birth and death? Specifically, howare new genes and novel functions added togenomes?

    In the two decades since the discovery ofjingwei, there have been several hundred addi-tional publications reporting various interest-ing and significant observations of new genesand new gene functions in many different or-ganisms. Regrettably, we can only choose afew representative publications to sketch sev-eral lines of observation that can provide in-sight into an emerging, global picture of newgene evolution. We follow the growth of scien-tific information and underlying ideas and con-cepts in new gene evolution, beginning by dis-cussing the methods for identifying new genesand mechanistic processes of new gene forma-tion. We then describe the rates and patternsof new gene origination and evolution that may

    326 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    Fixation: thepopulation geneticprocess by which amutation spreads to allindividuals in apopulation

    Monophyletic group:a group of taxa thatshare a commonancestor

    indicate some rules governing these processesand discuss the evolutionary forces that act onnew genes. Finally, we review the rapid growthof studies of the phenotypic effects of new genesand their impact on phenotypic evolution.

    THE CONCEPT OF NEWGENE ORIGINATION

    To understand various basic properties of newgene evolution, we need to have some concep-tion of the process of new gene origination andan operational definition for the process. Thisdefinition helps us explore methods for newgene identification.

    The Process of New Gene Origination

    New gene origination is a microevolutionaryprocess. A protogene structure is first generatedby a mutation in a single germ-cell genome.This protogene structure must then spreadthrough the population until it is fixed. Vari-ous evolutionary forces, such as natural selec-tion and genetic drift, govern the spread of theprotogene through the population, thus makingprotogene fixation a population genetic pro-cess. Both before and after fixation, the pro-togene accumulates mutations that confer on itnew structures and beneficial, sometimes novel,functions that are acted on by natural selection.From the point that the protogene carries anoptimized function and is fixed in the genome,it is essentially the same as most other, oldergenes in the genome and can be considereda new gene. New gene studies typically focuson these first two stages (the fixation processand acquisition of a beneficial function) and theconsequences of accepted mutations on the se-quence, structure, and function of the new gene.As the last section of this review shows, thesemicroevolutionary changes produce macroevo-lutionary changes in traits such as developmentand brain function.

    Interest in new gene origination has raisedseveral general problems. What molecularmechanisms generate new gene structures?What are evolutionary forces that drive the

    origination of new genes? How often are newgenes fixed in a species? Are there any rulesor patterns of new gene origination? What arethe roles of new genes in phenotypic evolution?This review provides an overview of efforts tounderstand the answers to these problems.

    Approaches to Identifying New Genes

    All new gene identification methods are basedon comparative analysis of the structures ofgenes and genomes. Within a group of closelyrelated species, we can define new genes asthose that are present in all members of amonophyletic group but absent from all out-group species (Figure 1). Early studies oftenserendipitously identified new genes by analyz-ing the phylogenetic distribution of genomicDNA Southern blot signals or via characteri-zation of small genomic regions (e.g., 85, 108).Microarrays (42, 44, 45) and especially next-generation sequencing (168, 169) have maderecent searches for new genes more purpose-ful efforts.

    Multiple genomes. Syntenic alignments(Figure 1) of genomes can be used to identifynew genes from related species for which weknow the phylogenetic relationships. Syntenicalignments of each gene in each species allowidentification of genes that are present orabsent in one genome relative to another(Figure 1). In these comparisons, a gene canbe defined as a new gene candidate if it ispresent in a certain clade or single speciesand absent in all outgroup species (Figure 1).Additionally, the orthologous genes that flankthe new gene candidate appear in all species un-der consideration. This strategy has been usedwith great success in Drosophila and mammals(35, 168, 169, 172). New genes formed by dif-ferent mechanisms also have correspondinglydifferent structural features that can be usedto infer the mechanism of new gene formationand the ancestral and derived characters.

    Single genomes. Duplicate genes withina single genome can be identified using

    www.annualreviews.org • New Gene Evolution 327

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    S1

    S2

    S3

    S4

    G1 G2 G3a

    SdicCdic AnnX

    D. simulans

    D. mauritiana

    D. melanogaster

    D. yakuba

    b

    Figure 1New genes are defined using syntenic and sequence comparisons between the genomes of a group of relatedspecies. (a) The general procedure to identify new genes. The relationship of species S1–S4 is shown by theblue tree. The relationships between the genes G1 ( yellow), G2 (red ), and G3 ( green) are shown within thespecies tree. Aligning the genomes of species S1–S4 shows that the new gene G2 is present in S1–S3 butabsent in S4, indicating that G2 arose in the common ancestor of S1–S3. G2 was thus generated in thegenome between old genes G1 and G3 in the common ancestor of S1, S2, and S3 (red star). (b) An example ofusing syntenic alignments to identify new genes. Sdic exists only in Drosophila melanogaster (110, 160). In thiscase, Sdic originated as a chimeric gene through recombination of duplicates of the two flanking genes, a 5′piece of Cdic encoding a cytoplasmic dynein intermediate chain and a 3′ piece of AnnX.

    exhaustive pairwise comparisons between allannotated genes in that genome. Most mech-anisms to form new gene structures (see be-low) result in certain structural changes in thenew gene. For example, new genes created byRNA-based duplication (retrogenes) most of-ten lack introns, contain a stretch of adeninenucleotide at their 3′ end, and contain a pair ofshort flanking direct repeats. These signals fadewith evolutionary time. Betrán et al. (11), Baiet al. (4), and Meisel et al. (100) took advantageof these new structures to identify new retro-genes in fruit flies; Wang et al. (147) in silk-worm; and Emerson et al. (43), Marques et al.(92) and Vinckenbosch et al. (144) in primatesand specifically humans. Divergence betweenthe new retrogene and the original gene fromwhich the retrogene was derived can be used todefine the age of the new genes using a molecu-lar clock. However, both strategies that we havediscussed so far can depend on the current an-notations, which are biased against the newestgenes, so caution must be taken when makingclaims about the presence/absence of genes indifferent genomes (167).

    Predicting functionality of new genes. Itis desirable to predict whether candidate newgenes are functional before beginning more

    laborious functional and phenotypic analyses.Comparisons of open reading frame length,transcription of new gene candidates, and sub-stitution rates between nonsynonymous andsynonymous sites (Ka versus Ks) and polymor-phism and divergence (60, 97) are often used topredict whether the new gene is functional. AKa/Ks ratio significantly lower than one (for sin-gle genome data, Ka/Ks < 0.5 in a comparisonbetween the new gene and its parental copy), forexample, indicates functional constraint actingon the new gene, which we would expect if dis-ruptive mutations were being prevented fromaccumulating in new protein-coding genes bynatural selection. These methods are widelyused as the first step to predict if a new gene islikely functional (e.g., 4, 11, 43, 147, 168, 169).

    MECHANISMS TO FORM NEWGENE STRUCTURES

    How are new gene structures formed? Mutationtoward a new gene structure is the first step ofnew gene evolution, and at least a dozen distinctmolecular processes are known that contributeto the formation of new genes. These mecha-nisms are covered in depth elsewhere (65, 84),so we only briefly touch on them here. We high-light several examples in Figure 2.

    328 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    Pseudogenes: genesthat are thought tohave lost their abilityto code for afull-length protein

    Gene Duplication

    Gene duplication is thought to contribute mostto the generation of new genes. A single (or afew) new gene structure(s) can be formed at onetime by DNA-based duplication (the copyingand pasting of a DNA sequence from one ge-nomic region to another) or retroposition. Al-though DNA-based duplications are often tan-dem (134), retroposed genes most often moveto a new genomic environment (14, 15, 65, 172),where they must acquire new regulatory ele-ments or risk becoming processed pseudogenes.An important gene duplication mechanism iswhole-genome duplication (WGD), which hasoccurred multiple times in eukaryote evolu-tion, particularly in plants (126). Hundreds tothousands of duplicate genes are formed by aWGD event, and the vast majority of dupli-cates are quickly lost. However, estimates ofduplicate gene retention after WGDs in teleostfishes (∼15% after 350 million years) (16), yeast(∼12% after 80 million years) (68), and Ara-bidopsis (∼30% after 80 million years) (13) allsuggest that large fractions of duplicated locican be retained. We show below that there area variety of ways that new gene structures cansubsequently acquire new functions (2, 33, 61,78, 158, 170). McLysaght et al. (98) showed thatWGD may more easily generate new paralogs.

    Alteration of Existing Gene Structures

    New gene structures can be generated bymodifying existing genes, domains, or exons.Gilbert (52) proposed that exons and domainscould be recombined to produce new chimericgene structures (Figure 2a,b). Chimericproteins formed by gene recombination havebeen found in many organisms since theirdiscovery in the LDL receptor gene (86, 130),including yeast (133), Drosophila (85, 118, 119),Caenorhabditis elegans (67), mammals (92),and plants (151), and are estimated to havecontributed ∼19% of new exons in eukaryotes(see Reference 74 and references therein).In addition, retroposed sequences may jumpinto or near existing genes and recruit existing

    exons, or be recruited into an existing codingsequence (164). Conversely, new gene struc-tures may be formed by splitting existing genes.Wang et al. (149), for example, found thatgene duplication is an intermediate stage inan evolutionary process leading to gene fission(Figure 2c). Okamura et al. (113) demonstratedthat frameshift mutations often generate newcoding sequences and found 470 human geneduplicates that had done so. Xue et al. (157)found that the Epstein-Barr virus contains anearly gene that undergoes frequent frameshifts,probably to combat host immunity. In addi-tion, divergence in alternative splicing patternsbetween duplicate genes can generate distincttranscripts that produce noncoding RNAs orpolypeptides with slightly or entirely differentfunctions and rapidly alter duplicate genestructures and functions (51, 57, 69, 163, 173).

    De Novo Genes

    New gene structures may arise from previouslynoncoding DNA (Figure 2d ). Chen et al. (24)were the first to show that antifreeze proteins,which bind and halt the growth of ice crys-tals in the blood of some polar fishes, werecreated by amplification of previously noncod-ing microsatellite DNA. Since then, a numberof de novo genes originating from noncodingregions have been identified in Drosophila (6,26, 75, 168, 172), humans (71, 153, 155, 169),primates (137), murine rodents (104), proto-zoa (159), yeast (17, 21), rice (154), and viruses(122). Similar to strict de novo gene origination,horizontal gene transfer (HGT), the exchangeof genes between genomes from distantly re-lated taxa, can immediately add new genes andfunctions to a genome (Figure 2f ). HGT is amajor mechanism for the addition of new genesto prokaryotic genomes (73, 111) but has alsobeen reported in a number of eukaryotic or-ganisms, including plants (8, 161), insects (102),and fungi (56) (Figure 2f ).

    Noncoding RNAs

    Not all new genes code for proteins. Noncod-ing RNAs were found to play an important role

    www.annualreviews.org • New Gene Evolution 329

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    E1E2

    –E13

    E14

    E15

    E1E2

    –E10

    PSM

    D4

    PIP5

    K1A

    PIPS

    L

    Chr 1

    Chr 1

    0

    Adh-

    deri

    ved

    enzy

    mat

    ic d

    omai

    n

    Dup

    licat

    ion

    Dup

    licat

    ion

    Ymp

    Adh

    Jing

    wei

    Pseu

    doex

    ons

    Retr

    opos

    ition

    Hyd

    roph

    obic

    dom

    ain

    a

    Read

    -thr

    ough

    tran

    scri

    ptio

    n

    Reve

    rse

    tran

    scri

    ptio

    n

    Alte

    rnat

    ive

    splic

    ing

    b

    e

    180

    160

    020

    4060

    80

    Tim

    e (M

    ya)

    100

    120

    140

    Δrps

    2Δr

    ps11

    rps1

    1

    Betu

    laCo

    rylu

    s

    Ambo

    rella

    Sang

    uina

    ria

    Actin

    idia

    Abel

    iaO

    xalis

    Fagu

    sCa

    suar

    ina

    Apiu

    mN

    icot

    iana

    Sarr

    acen

    ia

    Buxu

    sPl

    atan

    us

    Acor

    usPa

    ndan

    usJu

    ncus

    Bocc

    onia

    Ranu

    ncul

    usCa

    ulop

    hyllu

    m

    Mag

    nolia

    Pipe

    rAu

    stro

    baile

    yaN

    ymph

    aea

    Loni

    cera

    rps2

    rps1

    1

    3' rp

    s11

    atp1

    fP

    Intl

    Intl

    P

    Inte

    gras

    e

    Att

    Fore

    ign

    gene

    ?

    Alu

    DAF

    DAF

    Alu

    B3B1

    B4B1

    B3

    B3B1

    B4B1

    B3

    150

    100

    50

    0

    250

    200

    400

    350

    300

    0

    600

    100200300

    1,0001,1001,2001,3001,4001,5001,600

    400500

    700800900

    mNSC

    I

    mNSC

    I

    cd

    TAG

    TGA

    TTA

    GG

    AA

    TG

    TGA

    TGA

    Mut

    atio

    n to

    gen

    erat

    e CD

    S(c

    odin

    g se

    quen

    ce re

    gion

    s)

    Mou

    se g

    ene

    ENSM

    USG

    0000

    0078

    384

    Mou

    se Rat

    Gui

    nea

    pig

    Hum

    an

    Mou

    se Rat

    Gui

    nea

    pig

    Hum

    an

    Mou

    se Rat

    Gui

    nea

    pig

    Hum

    an

    ATGCT-AACATACCCGGACTTTGCCGATCTCATTCTTGTGGATGGAAATGTTGGTGTTGA

    ATGCTGAACATACCCGGACTTTGCCAATCTCATTCTTGTGGATGGAAATGTTGGTGTTGA

    CTGCTGTACATACCCGGACTCTGCCAAACTCGTTCTTGTGGATGGAAATGTTGGTGCCAA

    CTGCCACACATACCCGGACTTTGCCGATCTCGTCCTTGTGGATGGAGATGTTGGTGCCGA

    GAGTGGTCACAGTGACCTGTCTCACGTAGGACACAGCGGGGCTACCCCGGTTCTCATTCT

    GGGTGGTCACAGTGACCAGTCTCACATAGGACACGGCAGGGTTGCCTCGGTTCTCGTTCT

    GGGCAGACACGGTGACACGCTTCACGTAGGACACGGCAGGGCTGCCTCGGTTCTCGTTTT

    GGGCAGCCACGGTGACGACTCTCACGTAGGACACAGCAGGGTTGCCCCGGTTCTGGTTCT

    TGGTTGTGACAGTGAAGGGAGTCAGGCCCTCGGCATTGACCCCAGGACAGAGCGTTCCTG

    TGGTTGTGACAGTGAAGGGAGTCAGGCCCTCGGCATTGATCCCAGGACAGATTGTTCCTG

    TGGTGGTGACAGTGAAGGGTGTCAGGCCCTCAGCACTGACCCCCGGGCAGCCCACTGCTG

    TGGTGGTGACGGTGAAGGGTGTCAGGCCCTGGGTGCTGACCCCCGGGCAGCCAGTTGTTG

    D. m

    elan

    ogas

    ter a

    ndD

    . sim

    ulan

    s mkg

    (anc

    estr

    al g

    ene)

    D. m

    aurit

    iana

    ance

    stra

    l mkg

    (hyp

    othe

    tical

    )

    D. m

    aurit

    iana

    mkg

    -r3/

    mkg

    -p1

    (obs

    erve

    d)

    D. m

    aurit

    iana

    mkg

    -r3/

    mkg

    -p1

    (pre

    dict

    ed)

    Dup

    licat

    ion

    Com

    plem

    ent d

    egen

    erat

    ion

    Gen

    e fis

    sion

    with

    seq

    uenc

    e lo

    ss

    TAG

    ATG

    TAG

    ATG

    ATG

    TAG

    ATG

    TAG

    ATG

    TAG

    ATG

    TAG

    ATG

    TAG

    330 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    in neuronal functions in the early 1990s (136).A large number of functional RNAs from non-coding regions have been reported to play vitalroles in a wide variety of organisms (7, 80). Mi-croRNAs appear to turn over rapidly, but canbe strongly influenced by positive selection (89,90, 109). Strikingly, Dai et al. (34) showed that anew long noncoding RNA influences courtshipbehavior in D. melanogaster. Pseudogenes areconventionally thought of as dead genes thatplay no functional roles (41), but they mayevolve functions in regulating expression of re-lated genes. Zheng & Gerstein (171) recentlyfound that many mammalian pseudogenesare transcribed and thus may still function.McCarrey & Riggs (96) predicted that pseudo-genes may regulate their parental genes, similarto long noncoding RNAs or miRNAs. An ex-plicit mechanistic model of the use of pseudo-gene transcripts as decoys for cross-regulatingexpression of target genes was actually proposedand tested by Marques et al. (93, 94).

    New Gene Regulatory Systems

    New genes must acquire a specific transcrip-tion regulatory system to ensure certain tempo-ral and spatial expression patterns. Betrán et al.(10) investigated the origin of the male-specificexpression of Dntf-2r, a retroposed gene in theD. melanogaster–Drosophila simulans clade. Thenew retrogene did not contain the parental pro-moter but had acquired a new β2-tubulin-like

    promoter by recruiting a novel 5′ regulatory se-quence. This regulatory sequence drives testis-specific expression of β2-tubulin and appearsto still do so for Dntf-2r. In addition, the newretrogene Xcbp1 recruited existing neuron pro-moters present at its site of integration (29).This co-opted mode of promoter recruitmentis also observed in human retrogenes (144) andmay be a general mode for retrogene promotergain (65). Additionally, Ni et al. (107) observedthat eight new genes essential for Drosophila de-velopment evolved binding sites for the CC-CTC binding factor (CTCF) insulator underpositive selection, ensuring the delineation ofthe regulatory domains of these genes.

    Transposable Elements

    Transposable elements (TEs) can contribute tofunctional divergence between duplicate genesthrough several methods, all similar to those de-scribed above (12). For instance, TEs can me-diate gene recombination by carrying codingsequences from one part of the genome to an-other (63, 158) and can even themselves be in-corporated into existing coding sequences (46,88, 106). In addition, TEs were recently foundto be a source of micro-RNAs, which are ma-jor components of posttranscriptional regula-tion of expression (116).

    Although we still have a developing pictureof the contributions of each of these mecha-nisms for new gene formation in different taxa,

    ←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 2Representative new genes exhibiting various new gene origination mechanisms. (a) Jingwei, a new gene found only in Drosophila teissieriand Drosophila yakuba, was generated by a combination of retroposition, DNA-based duplication, and gene recombination, whichformed a chimeric gene consisting of Adh-derived enzymatic domain and a hydrophobic domain from Ymp (85, 150). (b) PIPSL inhumans is a consequence of gene fusion between two adjacent ancestral genes by read-through transcription and subsequentcoretroposition (164). (c) Gene fission split the ancestral gene monkeyking into two distinct genes in Drosophila mauritiana, revealing anintermediate process of gene fission aided by gene duplication and complementary degeneration (149). (d) The geneENSMUSG00000078384 in mouse revealed the evolutionary process of de novo gene origination (104). Red boxes are ancestral stopcodons (TGA) with two triangles showing the positions of the enabling mutations, including a substitution and a deletion. (e) Two newgenes in humans, DAF and mNSCI, were generated by domesticating transposable elements, Alu, and short interspersed elements(B1–B4) (91, 106). DAF and Alu elements together make an interesting case in which alternative splicing generated a new isoform inthe mammalian genome. ( f ) Horizontal gene transfer (HGT) is prevalent in bacteria with mechanisms such as homologousrecombination (111). Antibiotic resistance genes can be acquired by host genomes containing the intl gene (which encodes integrase), arecombination site (att), and a promoter to express the captured gene, as depicted by the process shown in the three panels on the left.

    www.annualreviews.org • New Gene Evolution 331

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    work in humans and Drosophila suggests that∼80% of genes are formed by DNA-based du-plication, 5% to 10% by de novo duplication,and ∼10% by retroposition (168, 169). And al-though these mechanisms may generate the ini-tial gene structures, many new structures (in alarge variety of taxa) undergo radical structuralrenovation to change exon-intron structure andeven recruit new or existing coding sequencesinto the new locus (30, 49, 151, 172).

    Evolution of Transcription Units

    Other than the origination and evolution of themacrostructure of genes described above, it wasrecently found that the transcription units inthe genes of vertebrates have been direction-ally evolving toward a productive transcription.Almada et al. (1) reported a highly significantlinear correlation between the gene age and thecritical signals to define transcription units in agene, including the U1 small nuclear ribonu-cleoprotein recognition sites and polyadeny-lation sites (PASs). The observed incrementalgain of the U1 sites and gradual loss of PASsin the 5′ end of protein-coding genes revealeda selection for a U1-PAS axis for productivetranscription.

    ABUNDANCE ANDORIGINATION RATESOF NEW GENES

    The advent of whole-genome sequences formany organisms allowed identification of manynew DNA-based and RNA-based duplicategenes (e.g., 11, 43). With more genome se-quences available, especially in closely relatedgroups such as the twelve Drosophila species(32), it became possible to investigate the ratesof new gene origination in particular lineages.We review these findings in Drosophila, mam-mals, and plants. There have been no re-ports of new gene origination rates for mech-anisms other than DNA-based duplication,RNA-based duplication, de novo origination,and gene recombination. Thus, the rates of new

    gene origination we highlight should be viewedas serious underestimates.

    Drosophila

    The first estimate of the rate of new gene orig-ination was made for retrogenes in Drosophilain 2002 by Betrán et al. (11), who identified∼150 retrogenes in D. melanogaster (4, 11) thatarose after the divergence of the Drosophila andSophophora subgenera approximately 50 Mya.Their estimate of three new retrogenesper million years in the lineage leading toD. melanogaster was corroborated by an inde-pendent estimation of ∼1.5 new retrogenes permillion years based on cDNA hybridiza-tion against salivary polytene chromosomesin species in the D. melanogaster subgroup(∼25 million-year-old) (158). Zhou et al. (172)computationally estimated via DNA-basedduplication, retroposition, de novo origination,and gene recombination new gene originationrates in the D. melanogaster subgroup to be5–11 new genes per million years and founddifferent rates for the four mechanisms. In par-ticular, approximately 80% of new genes addedto the D. melanogaster lineage genome weregenerated by DNA-based duplication. Moreextensive and detailed analyses of DNA-basedand RNA-based duplicates were conducted byVibranovski et al. (142), Meisel et al. (100), andZhang et al. (168). Zhang et al. (168) analyzedthe 12 Drosophila genomes and estimated that∼17 duplicate genes per million years arosein the Drosophila genome. Figure 3a showsthe distribution of these new genes on theDrosophila phylogeny.

    Mammals

    Emerson et al. (43) and Marques et al. (92) iden-tified ∼120 retrogenes in the human genome,yielding an estimated retrogene origination rateof one retrogene per million years in the lin-eage leading to humans. Zhang et al. (166, 169)systematically identified new genes in verte-brates, especially in primates, and showed thatthe rates of new gene origination are variable

    332 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    D. m

    elanogaster

    D. sechellia

    D. sim

    ulans

    D. yakuba

    D. erecta

    D. ananassae

    D. persim

    ilis

    D. pseudoobscura

    D. w

    illistoni

    D. grim

    shawi

    D. m

    ojavensis

    D. virilis

    Branch 0

    Branch 1

    Branch 2

    Branch 3

    Branch 4

    Branch 5

    Br. 6

    40

    35

    25

    11

    6

    3

    Mya

    284

    68

    154

    161

    220

    11,909

    60

    a Drosophila

    Hum

    an

    Chimp

    Orangutan

    Rhesus

    Marm

    oset

    Mouse

    Guinea pig

    Dog

    Cow

    Arm

    adillo

    Terec

    Opossum

    Platypus

    Chicken

    Lizard

    Frog

    Fugu

    Zebrafish

    Branch 0

    Branch 1

    Branch 2

    Branch 3

    Branch 4

    Branch 5

    Branch 6

    Branch 8

    Branch 9

    Branch 10

    Branch 11Br. 12

    450

    370

    310

    220

    160

    100

    7043

    25

    136

    Mya

    389447

    392

    286

    314130130130

    336

    1,214

    945

    1,018

    1,393

    1,013

    12,058

    b Vertebrates

    Figure 3The phylogenetic distribution of new gene origination events in (a) Drosophila and (b) vertebrates. These genes were generated byDNA-based duplication, retroposition, and de novo origination (168, 169). The number of new genes that originated in each timeperiod is shown above the branch. For example, in a, branch 1 shows that 220 genes originated between 36 and 41 Mya in Drosophila. Inb, red numbers are new genes that originated in the hominoid branches or specifically in humans.

    www.annualreviews.org • New Gene Evolution 333

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    in different evolutionary stages of vertebrates(Figure 3b), although 25–30 genes generatedde novo and by DNA-based and RNA-basedduplication arise per million years. Interest-ingly, this rate is much higher on the branchescloser to human (66 new genes per million yearsin the human lineage alone) (166).

    Plants

    In contrast to flies and mammals, Zhanget al. (165) reported that 0.6 retrogenes permillion years arose in the Arabidopsis thalianagenome, a rate comparable to Populus (174),and a microarray-based study in Arabidopsisidentified 94 new genes created by DNA-basedduplication and retroposition (45). Surpris-ingly, Wang et al. (151) found that a very highrate of retrogene and chimeric gene originationwas present in rice: More than 1,000 retrogeneswere identified in the rice genome, 380 ofwhich evolved chimeric gene structures byrecruiting previously existing genes into theirgene structures. These authors determined therate of chimeric gene origination to be 7 permillion years in grass genomes in the lineageleading to rice, 50 times the origination rate ofchimeric genes in humans (144), and the high-est rate of chimeric gene origination known.In addition, Jiang et al. (63) identified morethan 3,000 gene recombinants in rice mediatedby Pack-Mutator-like transposable elements(Pack-MULEs). These results suggest a hugepotential for protein diversity in plant genomes.

    Along with these extensive studies inDrosophila, mammals, and plants, there havebeen many valuable investigations of chimericgenes and retrogenes in Caenorhabditis elegans(66), fish (25, 49), silkworm (147), and chicken(62).

    Copy Number Variation

    Inexpensive whole-genome analysis has alsomade it possible to identify genes at thevery earliest stages of their evolution, beforefixation. Abundant copy number variation(CNV) of individual genes has been detectedin Drosophila (40, 42, 124), humans (47), mouse

    (54), and C. elegans (81). Dopman & Hartl (40),Emerson et al. (42), Cardoso-Moreira & Long(20), and Cardoso-Moreira et al. (19) identifiedmore than 1,000 partial and 100 completegene duplications/deletions in just 15 strains ofD. melanogaster relative to the referencegenome using microarray hybridization.In addition, next-generation sequencingand microarrays have identified more than1,200 partial and 600 complete gene du-plications/deletions in 179 individual humangenomes relative to the reference genome (101,125). The recent sequencing of 43 genomes intwo D. melanogaster populations detected moreCNVs, including 2,588 duplications and 3,336deletions relative to the reference genome (74).The large number of new genes segregating inpopulations is just now beginning to be appre-ciated and investigated further. An active areaof research will be to perform functional andstatistical analyses of these new genes to under-stand their earliest stages of evolution. In all,these studies have shown that new gene origina-tion rates can differ between taxa, yet are appre-ciable in all groups studied. These results fur-ther strengthen the conclusion that new geneorigination is a general evolutionary process.

    PATTERNS OF NEW GENEORIGINATION

    Gene Traffic in Drosophila, Humans,and Other Organisms

    With the large number of new genes identifiedin various organisms, researchers were ableto investigate statistical patterns of new genecharacteristics to explore the mechanistic andevolutionary forces that impact the formation,origination, and evolution of new genes. Betránet al. (11) examined the chromosomal distri-bution of retrogenes and their parental copiesin D. melanogaster (Figure 4a). Surprisingly,these authors found a significant excess ofautosomal retrogenes derived from X-linkedparental genes (X→A) and a significantdeficiency of retrogenes formed in the oppositedirection (A→X) or between autosomes

    334 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    b Humans

    Autosomes

    X

    299%299%299%260%260%260%

    –10 ~ –12%

    Excess maleExcess malebiased functionsbiased functionsExcess malebiased functions

    Excess non-sex andExcess non-sex andfemale functionsfemale functions

    Excess non-sex andfemale functions

    a Drosophilia

    X

    2 332

    4–39% –39%

    –33%

    Excess malebiased functions

    114%

    Excess malebiased functions114%

    Figure 4Retrogene traffic in (a) Drosophila (11, 142) and (b) humans (43). Each arrow indicates the movement of retrogenes from the parentalgene chromosomal location to the retrogene’s location. The size of the arrow indicates the intensity of gene movement betweenchromosomes, and the percentages show quantitatively the excess of movement over the null expectation (random origination andinsertion). The functions of the retrogenes are indicated.

    (A→A). Bai et al. (4) further revealed thatretrogenes derived from autosomal parentalcopies tend to locate to the same chromosomeas the parental copies. However, 42 out ofthe 43 retrogenes exhibited X→A movement;only one retrogene moved X→X. These twoobservations clearly reveal a striking patternof new gene origination in flies: Retrogenesderived from X-linked genes prefer to copy intoautosomes. This directional movement of newgenes is called gene traffic (43). These resultshold in the 12 sequenced species of Drosophila(100, 142) and in Anopheles gambiae (5, 138).Interestingly, 90% of X→A retrogenes inD. melanogaster are expressed in testis, a signif-icantly higher proportion of testis-expressedgenes than average (11), suggesting that the ret-rogene’s function (in this case, male-beneficialfunction) can influence its relocation. Thesymmetric pattern was observed in silkworm,which has ZW sex determination (femalesare ZW and males ZZ), whereby genes retro-posed from Z→A tend to be ovary expressed(147). Gene traffic appears to be general inDrosophila for different mechanisms of newgene formation, as Vibranovski et al. (142) alsoshowed that new genes created by DNA-basedduplication exhibit the same X→A movementand testis expression. Moreover, the neo-X

    chromosome, an autosomal chromosome armthat fused to the ancestral X chromosome inthe Drosophila genus evolution, also shows thesame excess of gene traffic (100, 142).

    Relative to Drosophila, human and mousestudies revealed similar yet distinct patternsof gene traffic (43). Compared with a neu-tral expectation based on the chromosomaldistribution of processed pseudogenes, whichare expected to be evolving neutrally, there isan excess of X→A retrogene movement andmost X→A retrogenes exhibit testis expres-sion. However, there is also a significant ex-cess of A→X retrogene movement in human,and these A→X retrogenes exhibit either fe-male expression or unbiased expression. A→Amovement is very low in humans (43). Themouse genome shows a very similar pattern.Zhang et al. (166, 168) have shown that thesepatterns exist for DNA-based duplicates, retro-genes, and de novo genes in Drosophila, humans,and mouse.

    Consequences of Gene Traffic forGenome Evolution

    If gene traffic has been historically impor-tant for genome evolution, the majority oftestis-biased/male-biased genes should be

    www.annualreviews.org • New Gene Evolution 335

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    MSCI model:X chromosomeinactivation duringspermatogenesis favorsrelocation of genesinvolved inspermatogenesis toautosomes

    autosomal, contrary to the previous conclusionthat the X was a hotbed for male-biased genes(148). Several microarray-based studies ofmale-biased genes and their chromosome loca-tions by Ranz et al. (117) and Parisi et al. (114)in Drosophila, Khil et al. (70) in mouse, and laterby Zhang et al. (166) in Drosophila, humans,and mouse have confirmed this prediction. InDrosophila, Zhang et al. (168) showed a smoothtransition of new male-biased genes from Xlinkage to autosomal linkage over evolutionarytime.

    Models to Interpret the Causes ofGene Traffic

    In general, models to explain gene traffic, andexperimental evaluation of those models, showthat natural selection is a major force govern-ing gene traffic but that mutational processeslikely also play a role (38). Meiotic sex chro-mosome inactivation (MSCI) in the male germline (11, 43, 139, 140), dosage compensationin the heterogametic sex (3, 143), sexual an-tagonism between male- and female-beneficialgenes (22, 128), and meiotic drive (131, 132)have all been implicated in driving gene traf-fic. The relative role of each of these forces hasbeen hotly debated. MSCI has a strong effect inmammals (70), and experimental evidence forMSCI in Drosophila comes from several studies(59, 139, 140). Vibranovski et al. (139) showedthat genes that are highly expressed in themeiotic phase of spermatogenesis (when the Xchromosome is predicted to be inactivated) aresignificantly enriched on the autosomes. Con-versely, genes expressed in the mitotic phases ofspermatogenesis are randomly distributedthroughout the genome. Other studies sug-gest reduced expression throughout spermato-genesis, including in the spermatogonia, whichalso discredits dosage compensation models(99; however, see 141). A clear-cut single celltranscriptome is needed to clarify these issues.Along with the MSCI model, other non-germ-line-based models, e.g., sexual antagonism, arealso necessary to interpret the expression of newgenes in the male somatic cells, although these

    models need to be rigorously experimentallytested.

    Correlation Between Gene Ageand Expression

    Early studies revealed a connection be-tween the expression and the ages of newgenes. Betrán & Long (10) showed thatDntf-2r, a ∼10 million-year-old gene in theD. melanogaster subgroup, is expressed onlyin testis; however, its parent Dntf-2 is ex-pressed ubiquitously. Almost all retrogenes inDrosophila appear to have testis expression (4)and to have maintained testis-biased or testis-specific expression independent of age (50).Vinckenbosch et al. (144) showed that new hu-man retrogenes are often transcribed in testisand later evolve stronger and more diversespatial expression patterns, coining the “out ofthe testis” hypothesis. Whether or not the testisis the starting point for new genes, a generalsurvey of the expression patterns for new genesthat originated within vertebrates revealedstrong positive correlation with the age in bothtranscription intensity and spatial expression(167). It is possible that this testis-biasedpattern of retrogene expression is due to ourinability to detect genes expressed at low levelsin different tissues, but this issue should beresolved soon with advances in next-generationsequencing.

    EVOLUTIONARY FORCESACTING ON NEW GENES

    Evolutionary forces, such as natural selectionand genetic drift, operate on both facets of newgene evolution: the fixation of new gene loci andtheir acquisition of a beneficial function. Thesetwo facets may overlap. In this section, we dis-cuss theoretical models developed to describehow new genes arise and acquire novel func-tions as well as general approaches to studyingnew genes and the selective forces that act onthem.

    336 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    Neofunctionalization:the process by which anew gene acquires anovel function

    Selective Models of NewGene Evolution

    Muller (103) was among the first to recog-nize the potential importance of duplicategenes in evolution. He proposed a simplemodel whereby new duplicate genes couldacquire novel, beneficial functions distinctfrom those of the original copies. Ohno (112)elaborated on Muller’s model and namedthe fate Muller described as neofunctional-ization. However, Ohno also predicted thatduplicate genes are most often inactivatedand become pseudogenes. This classic modelassumes that the new gene is functional uponduplication and that the new gene subse-quently acquires mutations that provide anovel beneficial function. The novel functionis then preserved in the genome by naturalselection.

    However, strictly duplicate genes areredundant, and beneficial mutations are ex-tremely rare. How do new duplicate genesremain in the population long enough to accu-mulate a beneficial, selected mutation(s)? Thisproblem led to the development of models thatpredict selective preservation of both copiesat all stages of their evolution: adaptiveradiation (AR), innovation-amplification-divergence (IAD), and escape from adaptiveconflict (EAC). The AR model proposes thatgene duplication itself is favored, e.g., for in-creased dosage of a gene product, and that thenew duplicates then undergo functional radia-tion (48). Thus, AR posits that novel functionsare acquired after duplication. IAD and EAC,in contrast, propose that ancestral loci developnovel beneficial secondary functions beforeduplication (9, 36). Under IAD, repeated geneduplication is favored to increase the dosageof the novel secondary function. Differentduplicates are then free to optimize the ances-tral or novel secondary function, and only thetwo best copies are retained in the genome.The increase in the number of duplicate geneswithin the AR and IAD models also providesadditional targets for beneficial mutations,thus increasing the probability and speed of

    functional improvement. EAC predicts thatthe bifunctional ancestral gene is subject toselection before gene duplication, that adaptiveconflict between the ancestral function andthe new function constrains improvement ofthe selected function(s) before duplication,and that adaptive changes and functionalimprovement occur in the daughter genes afterduplication.

    For additional information on duplicategene evolution, see Conant & Wolfe (33), whosuggest that preservation of new genes stemsfrom the co-option of existing functions to servenew purposes, and Walsh (145, 146), who givesa detailed mathematical description of the mod-els and relative probabilities of neofunctionali-zation and pseudogenization.

    Examples of EAC (36), IAD (105), and AR(48) have been published, and each model hasspecific predictions for what we should observeif a new gene originated by each process (33).However, none of these models can be used as astatistical framework for rigorously testing theroles of evolutionary forces in new gene orig-ination. Classic molecular population genetictests based on nucleotide substitution patternsand allele frequency spectra do provide thisframework and have been used extensively todetect selection on new genes. These tests,such as the M-K (McDonald-Kreitman) test(97) and the HKA (Hudson, Kreitman, andAguade) test (60), detect elevated rates of aminoacid substitutions (M-K) or reduced effectivepopulation size (HKA) at loci. In addition,Thornton (135) introduced a coalescent-basedmodel that can be used to test for selectionon CNV. The HKA test and Thornton’s testcompare measurements of nucleotide variationin genes with a distribution of parameter valuesderived from neutral coalescent simulations.Thus, the M-K, HKA, and Thornton’s testsare used to test the classic model. Each ofthese five models (classic, AR, IAD, EAC, andstatistical) predicts that new genes should ex-perience strong natural selection after they areformed. We now discuss some of the evidenceindicating that this often appears to be thecase.

    www.annualreviews.org • New Gene Evolution 337

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    c

    In the clade ofD. subobscura-guanchi

    0.01

    JingweiKS = 0

    KA = 9

    In the clade ofD. teissieri-yakuba

    Adh

    KS = 0

    KA = XX

    Adh

    In the clade ofD. hydei-mettleri

    0.01Adh-Finnegan

    Adh

    0.1

    KA = XX

    KS = 0

    Siren

    In D. ananassae andD. bipectinata complex

    0.01Adh-twain

    KS = 0

    KA = 12Adh

    30/4

    D. simulans D. melanogaster D. simulans D. melanogaster

    Fixed retrogenes originatingon autosomes/the X Polymorphic retrogenes

    originating onautosomes/the XParental

    genes

    a65/32

    36/3

    Chimpanzee Humans Chimpanzee Humans

    Fixed retrogenes from A Xor X X copying over the

    retrogenes from A Xor X A copying

    Polymorphic retrogenes fromA A or X X copying overthe retrogenes from A X orX A copyingParental

    genes

    70/20

    D. teissieri D. yakuba D. teissieri D. yakuba

    b

    2/192/80/18Adh Jingwei4/11 21/16

    9/100/0

    Retroposition

    Figure 5Positive Darwinian selection acting on new genes. (a) Positive selection for the fixation of new retrogenes in Drosophila (124) andhumans (123). The numerator and denominator show the numbers of retrogenes that originate on the autosomes and the X,respectively. Tests based on the M-K framework indicate an excess of fixed X→A retrogenes in both species and strong positiveselection for X→A retrogene movement. (b) The jingwei ( jgw) gene in Drosophila (85). The ratios over the branches are the numbers ofnonsynonymous changes over the numbers of synonymous changes, and the ratios in the triangles are the ratios of divergence betweenthe species and the polymorphisms. M-K tests and Ka/Ks ratios indicate strong positive selection acted on jgw shortly after itoriginated. (c) Selection acted on all Adh-derived chimeric genes in Drosophila (64), as indicated by elevated Ka/Ks ratios.

    Fixation of New Genes Within Speciesand Populations

    The first study to identify signatures of selec-tion on a new gene journeying to fixation wasperformed by Llopart et al. (82), who analyzeda new variant of the jingwei gene in Drosophilateissieri, which lost its second intron. ThisD. teissieri–specific intron presence-absencepolymorphism exhibits a significant excess ofrare alleles and patterns of nucleotide polymor-phism that is consistent with moderate naturalselection driving the polymorphism to fixation.Selection has also been detected on CNV in

    D. melanogaster and other organisms. Emersonet al. (42) found a genome-wide pattern consis-tent with strong purifying selection on all CNVexcept duplications of whole genes. That is,single-gene duplications are under significantlyweaker purifying selection than partial gene du-plications or partial or complete gene deletions.Similarly, Schrider et al. (123, 124) showed asignificant excess of fixed versus polymorphicretrogene CNV originating from the X chro-mosome in both Drosophila and humans, indi-cating that natural selection governs the pat-terns of retrogene CNV evolution (Figure 5a).

    338 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    Overall, these studies show that natural selec-tion can play a key role in driving new genesto fixation. In addition, they highlight the useof classic population genetic tests in determin-ing whether selection acts on new genes duringtheir journeys to fixation.

    Selection on Sequence Changesin New Genes

    In addition to studies of the evolutionary forcesgoverning the fixation of new genes, many stud-ies have investigated the effects of selection anddrift on new gene sequences. Long & Langley(85) showed that the new chimeric gene jingweiin D. teissieri and Drosophila yakuba contains asignificant excess of nonsynonymous substitu-tions compared with nonsynonymous polymor-phisms (relative to the ratio of synonymous sub-stitutions to polymorphisms), indicating thatamino acid substitutions were rapidly driven tofixation shortly after the origination of jingwei(Figure 5b). Similarly, Nurminsky et al. (110)showed that a D. melanogaster–specific genefamily, Sdic, involved in sperm motility rapidlyacquired a new exon-intron structure and testis-specific expression (Figure 1). Sdic is a chimericgene composed of a 5′ piece of Cdic, encodinga cytoplasmic dynein intermediate chain, and a3′ piece of AnnX, a phospholipid binding pro-tein. This fusion protein underwent rapid struc-tural renovations, including the conversion of aCdic intron into an exon and an AnnX exon andCdic intron into a testis-specific promoter. Lowlevels of sequence polymorphism, preservationof coding potential, and the absence of Sdic inother closely related species suggest that Sdicwas rapidly swept to fixation.

    These first discoveries sparked searchesfor general evolutionary patterns in newgenes. Jones & Begun (64) searched for com-mon patterns in the evolution of three Adh-derived chimeric genes in different lineages ofDrosophila. All three new genes quickly accu-mulated a large number of amino acid replace-ment substitutions, several at identical aminoacid sites, in the Adh-derived region shortly af-ter they arose. Strikingly, Jones & Begun (64)

    and Shih & Jones (127) showed that differ-ent Adh-derived fusion genes often accumulatemutations at the same sites, regardless of towhich other gene they have fused (Figure 5c).In addition, each of the four Adh-derived fu-sion genes exhibits strong signals of acceleratedamino acid substitution using classic populationgenetic statistical tests (e.g., M-K test).

    Some of these observations have recentlybeen borne out by genome-wide studies. Xuet al. (156) surveyed structural differences be-tween more than 600 paralogous pairs of genesin plants and found that most new genes un-derwent radical changes in exon/intron contentand boundaries as well as insertion/deletions.And using molecular population genetic tests,Chen et al. (30) found that young genes inD. melanogaster show strong signals of selection.These authors predicted that ∼25% of aminoacid substitutions in young essential genes werefixed by natural selection. In addition, this sig-nal of selection diminishes as genes grow older.Altogether these studies indicate that there aregeneral patterns to new gene evolution: Newgenes often undergo rapid (or immediate) struc-tural and sequence renovations and expressionpattern changes that are driven by strong natu-ral selection.

    Analysis of New Gene Structureand Function

    In addition to analyses of new gene frequen-cies and nucleotide changes, many groups haveinvestigated the evolutionary forces acting onnew genes by analyzing new gene functions, ge-nomic locations, or expression patterns. Thiscomplementary approach has revealed severalfundamental patterns of new gene origination.Chen et al. (24) and Cheng & Chen (31), forexample, investigated the antifreeze proteinsfound in the blood of several orders of Arc-tic and Antarctic fish. These proteins inde-pendently evolved in the different orders, yetthey consist of nearly identical tripeptide re-peats. These tripeptide repeats were generatedde novo by amplification of short nucleotidesequences. These studies showed that similar

    www.annualreviews.org • New Gene Evolution 339

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    environmental pressures may favor the genera-tion of genes with similar functions.

    In addition, as we showed in the pre-vious section, testis-biased genes are under-represented on the D. melanogaster and mam-malian X chromosome. Diaz-Castillo & Ranz’s(38) analysis of the genomic location of genesrelative to the position of chromosome domainsduring spermatogenesis led the authors to alter-natively propose that the enrichment of testis-biased retrogenes on the autosomes is causedby an increased availability during spermato-genesis of open chromatin domains that con-tain testis-expressed genes. This larger targetfor retrogene integration allows a higher pro-portion of these retrogenes to acquire testis-biased expression. These general observationsof the location of sex-biased genes, and theirgeneral movement off of the X chromosome,indicate that differences in expression alone candictate where in the genome new genes origi-nate. Together, these results show that stud-ies of general patterns of extant gene locations,structures, and expressions can be informativeof new gene origination and evolution.

    PHENOTYPIC EFFECTSOF NEW GENES

    Studying the roles of new genes in phenotypicevolution recently became feasible with the ad-vent of sophisticated genetic tools and molec-ular techniques as well as significant progressin related areas of important phenotypes in bi-ology. Young genes are often assumed to bedispensable because important functions arethought to require a long evolutionary periodto be developed and optimized (76). However,studies in the past decade have found numerous

    young genes with important, and sometimes es-sential, functions at the molecular, cellular, andindividual level (27).

    Biochemical Pathways

    New genes can generate new biochemicalpathways and products if they are enzymes orbecome enzymes. Zhang et al. (162) showedthat jingwei evolved the capacity to catalyzebreakdown of long-chain alcohols in D. yakubaand D. teissieri, whereas the parent Adh canonly act on short-chain alcohols. In Arabidopsis,Weng et al. (152) and Matsuno et al. (95)demonstrated that three recently evolvednew duplicate genes from the P-450 family,Cyp98A9, Cyp98A8, and Cyp84A4, assembledtwo new biochemical pathways related tophenolic metabolism that are required forpollen development and α-pyrone synthesis.

    Gene Expression Networks

    New genes can also be quickly integrated intoexisting gene networks. Chen et al. (30) ob-served that almost all young essential genes havebeen assimilated into protein-protein physicalinteraction networks in Drosophila, and a signif-icant number of these young genes have de-veloped multiple interactions with old genes(Figure 6). Integration appears to be drivenby natural selection. Several new genes havebecome new hubs. Analysis of one new gene,Zeus, derived from the DNA-binding pro-tein Caf40 via retroposition (28), revealed thatit retained ∼30% of Caf40’s DNA-bindingsites. However, in a short evolutionary period(4–6 million years) Zeus acquired 193 new bind-ing sites through which it activates or represses

    −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→Figure 6New genes integrated into gene networks and reshaped those networks. (a) New yeast genes that originated through duplication-based(blue) and non-duplication-based (red ) mechanisms since the recent whole-genome duplication (

  • GE47CH15-Long ARI 31 August 2013 8:56

    a

    YPS5

    CDC36MCM21

    ERD1

    YHL042WDID4

    SBE2

    YJL070C

    TMA17

    RSF1

    YLR125W

    PAM18

    YLL056C

    PFA3

    YLL023C

    YEL057C

    HUA2

    NDC1

    SPG3CRS5

    YCL049C

    YGR035CNIP1

    YER121W

    TCP1YLR030W

    PAU16

    GCN3

    YSC84

    LSB3

    GCD7

    CAT2

    CNM67

    MUK1

    ADY3

    UBP15

    GTS1

    YPL257WALD5

    YIL092W

    QNS1

    NAB2

    HSP150RAD3

    YBL044W

    YNL046W

    YBR184W

    CPR8

    YNR040W

    EAF6

    YPR096C

    DDR48

    YGL010W

    TMN2THP2

    BSP1SLA1

    IRC10

    ABP1

    SLA2

    YER186C

    YDL118W

    4–6 Mya after generatingZeus through retroposition

    107 amino acidsubstitutions in Zeus

    Zeus has created 193 new gene links andkept only 30% (129) of ancestral links of caf40

    Zeus

    c

    Caf40

    AGC

    ATC

    AT

    AT

    ATC

    ATCG

    GC GCT G GAT TGCGAC GCAGCTGCA GCTCGA CGATACGAC

    1

    1Bits

    2

    02 5 8 17 203 6 12 16 18 19 2113 147 9 10 15114

    Nucleic acidbinding groove

    Nucleic acidbinding groove

    Bits

    GAC

    AT

    ATC

    ATC

    ATG

    ATG GC GC AGC GCGAC GCTGGCAACTGAT CGTAGC GCTATCGCA

    1

    1

    2

    02 5 8 17 203 6 12 16 18 19 2113 147 9 10 15114

    b

    www.annualreviews.org • New Gene Evolution 341

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    hundreds of downstream genes involved inreproduction. This observation indicates thatgene expression networks can be rapidly andglobally reshaped in evolution by new genes.Li et al. (77) showed that a de novo gene inyeast can suppress a previously existing matingtype–control pathway, thus rewiring the struc-ture of gene networks in the species. Capraet al. (18) revealed that new genes in yeast be-come more integrated into cellular networksover time. The modified networks are not nec-essarily novel or unimportant, either: Konikoffet al. (72) found that genes have been contin-ually added and removed from the Wnt andTGF β-signaling pathways, ancient networksinvolved in animal development.

    Development

    Surprisingly, new genes can quickly acquireessential roles in development. Chen et al.(30) identified 59 genes that originated inthe past ∼35 million years in Drosophila thatevolved essential developmental functions.Silencing expression of these young genescauses development failure in early to latepupae and in some cases at even earlier stages(Figure 7a,b). Furthermore, tissue-specificknockdown of these young genes can causemorphological defects in adult flies. Silencingnew genes can also have a critical effect onreproduction, even when the individual cancomplete development. The duplicate gene nsr(novel spermatogenesis regulator) exists only inthe four species of the D. melanogaster cladethat diverged 3 Mya, yet it evolved an essentialfunction required for sperm individualization(39). Similarly, silencing Zeus, a gene in thesame group of Drosophila, causes sterility bydisrupting testis and sperm development (28).

    Recent work on Umbrea, a 12–15 million-year-old gene in Drosophila, carefully dissectedthe evolutionary steps this young gene tookto becoming essential in D. melanogaster (121).Umbrea arose by DNA-based duplication ofheterochromatin protein 6 (HP6) 12–15 Mya.Subsequent loss of one of its two domains(the chromodomain) and the accumulation of

    protein coding changes in the remaining chro-moshadow domain gave Umbrea a distinct chro-matin localization pattern at the centromere.Umbrea appears to have become essential onlyafter it lost the chromodomain 5–7 Mya. Care-ful molecular dissection, ancestral protein res-urrection, and population genetic analyses arethe keys to understanding the processes andtime new genes take to acquire important rolesin organisms.

    Brain Evolution in Flies and Humans

    Chen et al. (29) investigated the expression pat-terns of new genes in Drosophila and foundthat approximately five new genes per millionyears evolved brain expression patterns, mostlyin structures involved in olfaction and learn-ing/memory. All new brain genes are expressedin the α/β lobe, an evolutionarily new set ofneurons, implicating new genes in the evolu-tion of this brain structure. Some of the newbrain genes have significant effects on the be-havior. For example, Xcbp1 and Desr influenceforaging behaviors (29), and sphinx influencescourtship behaviors (34). The frequent acquire-ment of new brain genes into the genome andthe behavioral phenotypes of some of thesegenes suggested rapid evolution of behaviors,which is consistent with the remarkable obser-vations of Rollman et al. (120) that detecteda great variation in the the olfactory behav-ioral response associated with odorant receptorgene duplicates within the natural populationof D. melanogaster. The incorporation of newgenes into the brain is not specific to Drosophila.Zhang et al. (166) found a correlation betweennew genes and brain evolution in the human lin-eage. A high proportion of hominoid-specificand human-specific genes are expressed in theprefrontal cortex and temporal lobe, the newestbrain structures, in early fetal development.Strikingly, 54 of 380 human-specific genes areexpressed in these two brain regions, regionsthat are critical for proper cognitive function.One of these genes, SRGAP2, is involved inneocortical development (23, 37).

    342 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    CG62899 Mya

    Early larva

    G3237618 Mya

    Pharate

    G1346330 MyaEarly pupa

    a

    YLL1Gene duplicationCG7627b

    Mya

    LINES11343SH2-0995SH2-1101SH2-0504V39539V39540

    METHOD/MUTATIONP-element insertionEMS/G717SEMS/T765IEMS/synonymousRNAi/constitutive Gal4RNAi/constitutive Gal4

    PHENOTYPES Lethal, pupal stageLethal, pupal stageLethal, pupal stageViableLethal, pupal stageLethal, pupal stage

    D. a

    nana

    ssae

    D. er

    ecta

    D. ya

    kuba

    D. te

    issier

    iD.

    sim

    ulan

    sD.

    mau

    ritia

    naD.

    mela

    noga

    ster

    D. er

    ecta

    D. ya

    kuba

    D. te

    issier

    iD.

    sim

    ulan

    sD.

    mau

    ritia

    naD.

    mela

    noga

    ster

    8

    2

    4

    6

    0

    Figure 7The essential effects of new genes on development. (a) Development was terminated at the final stage whenthree different genes were knocked down using RNA interference (RNAi). (b) YLL1 originated in thecommon ancestor of the Drosophila melanogaster subgroup species ∼6–10 Mya, yet showed lethal effects inthe pupal stage when silenced by RNAi, mutated by EMS, or disrupted by the P element (30).

    Sexual Dimorphism andSexual Reproduction

    New genes impact sexual dimorphism by par-ticipating in the genetic systems that controlsexual reproduction and sex determination (87).As the aforementioned patterns of new gene

    origination show, the vast majority of new genesare sex-biased, especially male-biased, and theirorigination processes show directional copy-ing between the sex chromosomes and auto-somes (e.g., 11, 43). A number of new geneshave been identified with various phenotypic

    www.annualreviews.org • New Gene Evolution 343

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    effects, including testicular descent in theria(RLN3) (115), testis size in mouse (noncodingRNA gene, Poldi ) (58), sperm competition inD. melanogaster (Sdic) (160), and spermatogen-esis in Drosophila (nsr) (39).

    The ability of new genes to be incorporatedinto such conserved pathways, networks, anddevelopmental programs warrants considerablefurther study. What specific roles can newgenes play, and what characteristics of newgenes enable them to become essential compo-nents of these processes so quickly? New genesnow appear to be potent drivers of phenotypicevolution and the genetic control of importantbiological processes, and show that organismaldevelopment and organ development haveevolved species-specific and lineage-specificcomponents. Understanding the evolution andmodification of these components throughthe incorporation of new genes is a crucial tofurther research.

    CHALLENGES FOR THE FUTURE

    It is apparent that we have just a glimpse ofthe emerging world of new genes and thatthese genes play crucial roles in the rapid evo-lution of the genetic systems that govern bi-ological diversity. Questions about new geneevolution have opened many doors to both ourunderstanding of existing diversity and to newresearch. For example, most studies have ex-amined new genes generated from a few

    mechanisms, e.g., duplication and de novo orig-ination, leaving open a vast array of mecha-nisms to be investigated. Continued efforts willbe invaluable for understanding the abundanceof new genes, the mechanisms that have beenneglected so far, and even new gene evolutionin nonmodel organisms. An outstanding chal-lenge is to understand the roles of new genes inthe evolution and biology of phenotypes, andthe studies we have highlighted have left impor-tant, unresolved questions to be answered. Forexample, what evolutionary forces drive genetraffic? How do new genes evolve essential de-velopmental functions, and how quickly? Howis CNV driven to fixation, and when do CNVsacquire novel functions? How are importantstructures, such as the human brain, able toincorporate new gene functions, and how donew genes contribute to novel cognitive func-tion? Future studies of more, diverse pheno-types will help shed light on the general patternsand modes of new gene evolution and on the in-fluence of new genes on evolving systems. In ad-dition, understanding how phenotypes rapidlyevolve will require a deep understanding ofthe underlying local and global gene networks.This will be a tremendous challenge, rangingfrom the experimental deciphering and graphicdescription of the gene networks to a valid com-parative analysis of the ancestral and derivednetworks shaped by new genes and eventually tothe causal relationship of the altered networkswith the evolution of phenotypes.

    DISCLOSURE STATEMENT

    The authors are not aware of any affiliations, memberships, funding, or financial holdings thatmight be perceived as affecting the objectivity of this review.

    ACKNOWLEDGMENTS

    We thank all members of the Manyuan Long lab, past and present, for their scientific contributionto the relevant topics discussed in this review. We also thank the NIH, the NSF, and the PackardFoundation as well as the late Edna K. Papazian for their support of the study of new genesthroughout the past fifteen years as we explored this new and exciting area. M.L. is currentlysupported by NIH grants 1R01GM100768-01A1, NSF1051826, and NSF1026200; N.W.V. by

    344 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    the NSF Graduate Research Fellowship and partially by the NIH genetics training grant T32GM007197; S.C. by the NSF Doctoral Dissertation Improvement Grant DEB-1110607; andM.D.V. by a Pew Latin American Postdoctoral Fellowship.

    LITERATURE CITED

    1. Almada AE, Wu X, Kriz AJ, Burge CB, Sharp PA. 2013. Promoter directionality is controlled by U1snRNP and polyadenylation signals. Nature 499:360–63

    2. Arguello JR, Chen Y, Yang S, Wang W, Long M. 2006. Origination of an X-linked testes chimeric geneby illegitimate recombination in Drosophila. PLoS Genet. 2(5):e77

    3. Bachtrog D, Toda NRT, Lockton S. 2010. Dosage compensation and demasculinization of X chromo-somes in Drosophila. Curr. Biol. 20(16):1476–81

    4. Bai Y, Casola C, Feschotte C, Betrán E. 2007. Comparative genomics reveals a constant rate of originationand convergent acquisition of functional retrogenes in Drosophila. Genome Biol. 8(1):R11.1–1.9

    5. Baker DA, Russell S. 2011. Role of testis-specific gene expression in sex-chromosome evolution ofAnopheles gambiae. Genetics 189(3):1117–20

    6. Begun DJ, Lindfors HA, Kern AD, Jones CD. 2007. Evidence for de novo evolution of testis-expressedgenes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176(2):1131–37

    7. Berezikov E. 2011. Evolution of microRNA diversity and regulation in animals. Nat. Rev. Genet.12(12):846–60

    8. Bergthorsson U, Adams KL, Thomason B, Palmer JD. 2003. Widespread horizontal transfer of mito-chondrial genes in flowering plants. Nature 424(6945):197–201

    9. Bergthorsson U, Andersson DI, Roth JR. 2007. Ohno’s dilemma: evolution of new genes under contin-uous selection. Proc. Natl. Acad. Sci. USA 104(43):17004–9

    10. Betrán E, Long M. 2003. Dntf-2r, a young Drosophila retroposed gene with specific male expressionunder positive Darwinian selection. Genetics 164(3):977–88

    11. Betrán E, Thornton K, Long M. 2002. Retroposed new genes out of the X in Drosophila. Genome Res.12:1854–59

    12. Böhne A, Brunet F, Galiana-Arnoux D, Schultheis C, Volff J-N. 2008. Transposable elements as driversof genomic and biological diversity in vertebrates. Chromosome Res. 16(1):203–15

    13. Bowers JE, Chapman BA, Rong J. 2003. Unravelling angiosperm genome evolution by phylogeneticanalysis of chromosomal duplication events. Nature 422:433–38

    14. Brosius J. 1991. Retroposons: seeds of evolution. Science 251(4995):75315. Brosius J. 2003. The contribution of RNAs and retroposition to evolutionary novelties. Genetica 118(2–

    3):99–11616. Brunet FG, Crollius HR, Paris M, Aury J-M, Gibert P, et al. 2006. Gene loss and evolutionary rates

    following whole-genome duplication in teleost fishes. Mol. Biol. Evol. 23(9):1808–1617. Cai J, Zhao R, Jiang H, Wang W. 2008. De novo origination of a new protein-coding gene in Saccha-

    romyces cerevisiae. Genetics 179(1):487–9618. Capra JA, Pollard KS, Singh M. 2010. Novel genes exhibit distinct patterns of function acquisition and

    network integration. Genome Biol. 11(12):R12719. Cardoso-Moreira M, Emerson JJ, Clark AG, Long M. 2011. Drosophila duplication hotspots are associ-

    ated with late-replicating regions of the genome. PLoS Genet. 7(11):e100234020. Cardoso-Moreira M, Long M. 2010. Mutational bias shaping fly copy number variation: implications

    for genome evolution. Trends Genet. 26(6):243–4721. Carvunis A-R, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, et al. 2012. Proto-genes and de

    novo gene birth. Nature 487(7407):370–7422. Charlesworth B, Coyne JA, Barton NH. 1987. The relative rates of evolution of sex chromosomes and

    autosomes. Am. Nat. 130(1):113–4623. Charrier C, Joshi K, Coutinho-Budd J, Kim J-E, Lambert N, et al. 2012. Inhibition of SRGAP2 function

    by its human-specific paralogs induces neoteny during spine maturation. Cell 149(4):923–35

    www.annualreviews.org • New Gene Evolution 345

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    24. Chen L, DeVries AL, Cheng CH. 1997. Evolution of antifreeze glycoprotein gene from a trypsinogengene in Antarctic notothenioid fish. Proc. Natl. Acad. Sci. USA 94(8):3811–16

    25. Chen M, Zou M, Fu B, Li X, Vibranovski MD, et al. 2011. Evolutionary patterns of RNA-based dupli-cation in non-mammalian chordates. PLoS ONE 6(7):e21466

    26. Chen S-T, Cheng H-C, Barbash DA, Yang H-P. 2007. Evolution of hydra, a recently evolved testis-expressed gene with nine alternative first exons in Drosophila melanogaster. PLoS Genet. 3(7):e107

    27. Chen S, Krinsky BH, Long M. 2013. New genes as drivers of phenotypic evolution. Nat. Rev. Genet. Inpress

    28. Chen S, Ni X, Krinsky BH, Zhang YE, Vibranovski MD, et al. 2012. Reshaping of global gene expressionnetworks and sex-biased gene expression by integration of a young gene. EMBO J. 31(12):2798–809

    29. Chen S, Spletter M, Ni X, White KP, Luo L, Long M. 2012. Frequent recent origination of brain genesshaped the evolution of foraging behavior in Drosophila. Cell Rep. 1(2):118–32

    30. Chen S, Zhang YE, Long M. 2010. New genes in Drosophila quickly become essential. Science330(6011):1682–85

    31. Cheng C-HC, Chen L. 1999. Evolution of an antifreeze glycoprotein. Nature 401:443–4432. Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, et al. 2007. Evolution of genes and genomes

    on the Drosophila phylogeny. Nature 450(7167):203–1833. Conant GC, Wolfe KH. 2008. Turning a hobby into a job: how duplicated genes find new functions.

    Nat. Rev. Genet. 9(12):938–5034. Dai H, Chen Y, Chen S, Mao Q, Kennedy D, et al. 2008. The evolution of courtship behaviors through

    the origination of a new gene in Drosophila. Proc. Natl. Acad. Sci. USA 105(21):7478–8335. Demuth JP, De Bie T, Stajich JE, Cristianini N, Hahn MW. 2006. The evolution of mammalian gene

    families. PLoS ONE 1(1):e8536. Deng C, Cheng C-HC, Ye H, He X, Chen L. 2010. Evolution of an antifreeze protein by neofunction-

    alization under escape from adaptive conflict. Proc. Natl. Acad. Sci. USA 107(50):21593–9837. Dennis MY, Nuttle X, Sudmant PH, Antonacci F, Graves TA, et al. 2012. Evolution of human-specific

    neural SRGAP2 genes by incomplete segmental duplication. Cell 149(4):912–2238. Dı́az-Castillo C, Ranz JM. 2012. Nuclear chromosome dynamics in the Drosophila male germ line con-

    tribute to the nonrandom genomic distribution of retrogenes. Mol. Biol. Evol. 29(9):2105–839. Ding Y, Zhao L, Yang S, Jiang Y, Chen Y, et al. 2010. A young Drosophila duplicate gene plays essential

    roles in spermatogenesis by regulating several y-linked male fertility genes. PLoS Genet. 6(12):e100125540. Dopman EB, Hartl DL. 2007. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc.

    Natl. Acad. Sci. USA 104(50):19920–2541. Duret L, Chureau C, Samain S, Weissenbach J, Avner P. 2006. The Xist RNA gene evolved in eutherians

    by pseudogenization of a protein-coding gene. Science 312(5780):1653–5542. Emerson JJ, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural selection shapes genome-wide

    patterns of copy-number polymorphism in Drosophila melanogaster. Science 320(5883):1629–3143. Emerson JJ, Kaessmann H, Betrán E, Long M. 2004. Extensive gene traffic on the mammalian X chro-

    mosome. Science 303(5657):537–4044. Fan C, Chen Y, Long M. 2008. Recurrent tandem gene duplication gave rise to functionally divergent

    genes in Drosophila. Mol. Biol. Evol. 25(7):1451–5845. Fan C, Vibranovski MD, Chen Y, Long M. 2007. A microarray based genomic hybridization method

    for identification of new genes in plants: case analyses of Arabidopsis and Oryza. J. Integr. Plant Biol.49(6):915–26

    46. Feschotte C. 2008. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet.9(5):397–405

    47. Feuk L, Carson AR, Scherer SW. 2006. Structural variation in the human genome. Nat. Rev. Genet.7(2):85–97

    48. Francino MP. 2005. An adaptive radiation model for the origin of new gene functions. Nat. Genet.37(6):573–77

    49. Fu B, Chen M, Zou M, Long M, He S. 2010. The rapid generation of chimerical genes expandingprotein diversity in zebrafish. BMC Genomics 11(1):657

    346 Long et al.

    Changes may still occur before final publication online and in print

    Ann

    u. R

    ev. G

    enet

    . 201

    3.47

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrev

    iew

    s.or

    gby

    Uni

    vers

    ity o

    f C

    hica

    go L

    ibra

    ries

    on

    11/1

    4/13

    . For

    per

    sona

    l use

    onl

    y.

  • GE47CH15-Long ARI 31 August 2013 8:56

    50. Gallach M, Chandrasekaran C, Betrán E. 2010. Analyses of nuclearly encoded mitochondrial genes sug-gest gene duplication as a mechanism for resolving intralocus sexually antagonistic conflict in Drosophila.Genome Biol. Evol. 2:835–50

    51. Gardiner A, Barker D, Butlin RK, Jordan WC, Ritchie MG. 2008. Evolution of a complex locus: exongain, loss and divergence at the Gr39a locus in Drosophila. PLoS ONE 3(1):e1513

    52. Gilbert W. 1978. Why genes in pieces? Nature 271:50153. Gillespie J. 1987. Molecular evolution and the neutral allele theory. In Oxford Surveys in Evolutionary

    Biology, Vol. 4, ed. P Harvey, L Partridge, pp. 10–37. New York: Oxford Univ. Press54. Graubert TA, Cahan P, Edwin D, Selzer RR, Richmond TA, et al. 2007. A high-resolution map of

    segmental DNA copy number variation in the mouse genome. PLoS Genet. 3(1):e355. Haldane J. 1932. The time of action of genes, and its bearing on some evolutionary problems. Am. Nat.

    66(702):5