journal of theoretical biology - soken

12
Modeling evolutionary growth of a microRNA-mediated regulation system Tetsuya Akita a , Shohei Takuno b , Hideki Innan a,c,n a Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan b Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697-2525, USA c PRESTO, Japan Science and Technology Agency (JST), Saitama 332-0012, Japan HIGHLIGHTS c We introduce a model for the evolutionary growth of a miRNA gene regulation system. c We focus on the role of gene duplication in the evolution of complexity. c The model flexibly incorporates the positive and negative effects of gene duplication. c The model is applied to Arabidopsis thaliana miRNAs. c The growth of Arabidopsis mRNA systems should involve bidirectional evolutionary forces. article info Article history: Received 20 March 2012 Received in revised form 10 July 2012 Accepted 12 July 2012 Available online 20 July 2012 Keywords: Gene duplication Post-transcriptional regulation Complexity abstract Gene duplication plays a crucial role in the development of complex biosystems, but the evolutionary forces behind the growth of biosystems are poorly understood. In this work, we introduce a model for such a growth through gene duplication. Plant microRNAs (miRNAs) are considered as a model. miRNAs are one of the non-coding small RNAs (19–25 nucleotides), which are involved in the post-transcrip- tional gene regulation. A single kind of miRNAs can be encoded by multiple genomic regions called miRNA genes, and can regulate multiple kinds of functional gene families. It is assumed that a single miRNA system involves all these genes, miRNA genes and their target gene families. We are interested in how duplication of miRNA genes affects the evolution of the miRNA system by focusing on the numbers of miRNA genes and their target gene families, denoted by x and y, respectively. We here theoretically explore the evolutionary growth of (x,y); the former increases by duplication of the miRNA gene while the latter increases when an independent gene family acquires a novel binding site of the miRNA by mutations. We first investigate the evolutionary patterns of (x,y) under three commonly assumed scenarios for the evolution of duplicated genes, that is, the positive and negative dosage and neofunctionalization scenarios. The results indicate that under the three scenarios, the transient process of (x,y) is unidirectional, although the direction is different depending on the model. This pattern is not consistent with the observation in the Arabidopsis thaliana genome, suggesting that a model that incorporates at least two directional evolutionary forces is needed to explain the observation. Then, such a model called the ‘‘complexity growth model’’ is introduced, in which we assume that duplication of miRNA genes is evolutionary advantageous in that the system can encode a complex and sophisticated pattern of regulation because multiple miRNA genes can have different expression patterns. This is helpful to optimize the regulation of a few particular functional gene families, but there is a cost; once the system is optimized for one purpose, it could be difficult for other purposes to use it. That is, duplication of miRNA genes would narrow down the potential gene families that can join the system. Our theoretical analysis revealed that this model can explain the observation of Arabidopsis miRNAs. Although we consider plant miRNAs as an example in this work, the model can be readily applied to other regulation systems with some modifications. Further development of such models would provide insights into the evolutionary growth of the complexity of biosystems. & 2012 Elsevier Ltd. All rights reserved. 1. Introduction In the evolution of organisms throughout the tree of life, there is a general trend, from simple to complex. Eukaryotes have Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/yjtbi Journal of Theoretical Biology 0022-5193/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jtbi.2012.07.011 n Corresponding author at: Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan. Tel.: þ81 46 858 1600. E-mail address: [email protected] (H. Innan). Journal of Theoretical Biology 311 (2012) 54–65

Upload: others

Post on 08-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Journal of Theoretical Biology - SOKEN

Modeling evolutionary growth of a microRNA-mediated regulation system

Tetsuya Akita a, Shohei Takuno b, Hideki Innan a,c,n

a Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japanb Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697-2525, USAc PRESTO, Japan Science and Technology Agency (JST), Saitama 332-0012, Japan

H I G H L I G H T S

c We introduce a model for the evolutionary growth of a miRNA gene regulation system.c We focus on the role of gene duplication in the evolution of complexity.c The model flexibly incorporates the positive and negative effects of gene duplication.c The model is applied to Arabidopsis thaliana miRNAs.c The growth of Arabidopsis mRNA systems should involve bidirectional evolutionary forces.

a r t i c l e i n f o

Article history:Received 20 March 2012Received in revised form10 July 2012Accepted 12 July 2012Available online 20 July 2012

Keywords:Gene duplicationPost-transcriptional regulationComplexity

a b s t r a c t

Gene duplication plays a crucial role in the development of complex biosystems, but the evolutionaryforces behind the growth of biosystems are poorly understood. In this work, we introduce a model forsuch a growth through gene duplication. Plant microRNAs (miRNAs) are considered as a model. miRNAsare one of the non-coding small RNAs (19–25 nucleotides), which are involved in the post-transcrip-tional gene regulation. A single kind of miRNAs can be encoded by multiple genomic regions calledmiRNA genes, and can regulate multiple kinds of functional gene families. It is assumed that a singlemiRNA system involves all these genes, miRNA genes and their target gene families. We are interestedin how duplication of miRNA genes affects the evolution of the miRNA system by focusing on thenumbers of miRNA genes and their target gene families, denoted by x and y, respectively. We heretheoretically explore the evolutionary growth of (x,y); the former increases by duplication of the miRNAgene while the latter increases when an independent gene family acquires a novel binding site of themiRNA by mutations. We first investigate the evolutionary patterns of (x,y) under three commonlyassumed scenarios for the evolution of duplicated genes, that is, the positive and negative dosage andneofunctionalization scenarios. The results indicate that under the three scenarios, the transientprocess of (x,y) is unidirectional, although the direction is different depending on the model. Thispattern is not consistent with the observation in the Arabidopsis thaliana genome, suggesting that amodel that incorporates at least two directional evolutionary forces is needed to explain theobservation. Then, such a model called the ‘‘complexity growth model’’ is introduced, in which weassume that duplication of miRNA genes is evolutionary advantageous in that the system can encode acomplex and sophisticated pattern of regulation because multiple miRNA genes can have differentexpression patterns. This is helpful to optimize the regulation of a few particular functional genefamilies, but there is a cost; once the system is optimized for one purpose, it could be difficult for otherpurposes to use it. That is, duplication of miRNA genes would narrow down the potential gene familiesthat can join the system. Our theoretical analysis revealed that this model can explain the observationof Arabidopsis miRNAs. Although we consider plant miRNAs as an example in this work, the model canbe readily applied to other regulation systems with some modifications. Further development of suchmodels would provide insights into the evolutionary growth of the complexity of biosystems.

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

In the evolution of organisms throughout the tree of life, thereis a general trend, from simple to complex. Eukaryotes have

Contents lists available at SciVerse ScienceDirect

journal homepage: www.elsevier.com/locate/yjtbi

Journal of Theoretical Biology

0022-5193/$ - see front matter & 2012 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.jtbi.2012.07.011

n Corresponding author at: Graduate University for Advanced Studies, Hayama,Kanagawa 240-0193, Japan. Tel.: þ81 46 858 1600.

E-mail address: [email protected] (H. Innan).

Journal of Theoretical Biology 311 (2012) 54–65

Page 2: Journal of Theoretical Biology - SOKEN

evolved complicated and sophisticated biosystems by increasingthe number of genes. Any gene is derived from a pre-existinggene, and the number of genes increases by gene duplication.Then, some of them acquire novel functions and can be preservedfor a long time. Thus, it is obvious that gene duplication hasplayed a crucial role in the development of complex biosystems(e.g. Lynch, 2007), but the evolutionary forces behind the growthof biosystems are poorly understood. We here introduce a novelmodel for this process.

In this work, we consider microRNAs (miRNAs) as a model ofbiosystems. miRNAs are one of the non-coding small RNAs (19–25nucleotides), which are involved in the post-transcriptional generegulation. They are derived from RNAs with imperfect foldbackstructures, and can bind to messenger RNAs (mRNAs) that havecomplementary sequences to the miRNA sequence, causing areduction of the expression of the target gene through twomechanisms, translation repression and/or transcript cleavage(Bartel, 2004). miRNAs are abundant in a number of eukaryotespecies, particularly in metazoans and plants (Tanzer et al., 2010).Because miRNAs draw a tremendous amount of attention inmolecular biology, miRNAs and their target genes have been wellidentified in many species.

There are a number of differences in the miRNA-mediatedregulation process between plants and animals, and we find thatthe plant systems are more suitable for our initial modelingbecause the miRNA–target gene interactions are strictly definedand can be empirically verified with less ambiguity. Plant miRNAsrequire a higher sequence homology between miRNA and itsbinding site of a target gene than animal miRNAs (Carrington andAmbros, 2003; Bartel, 2004; He and Hannon, 2004). The bindingsites of plant miRNAs tend to be located in the coding regions, and

a relatively strict rule is applied for the basepairing; at most threeor four mismatch bases are generally allowed (Reinhart et al.,2002; Rhoades et al., 2002; Bonnet et al., 2004; Jones-Rhoadesand Bartel, 2004; Schwab et al., 2005; Llave et al., 2002). As plantmiRNAs directly cause digestion of the mRNAs by a certainenzyme,such specific cleavage sites can be strong evidence formiRNA regulation (Jones-Rhoades and Bartel, 2004; Schwab et al.,2005; Llave et al., 2002), which can be directly identified throughrelatively straightforward empirical and bioinformatics screens(reviewed in Chen and Rajewsky, 2007). Consequently, a nearcomprehensive list of miRNA–target gene interactions is availablefor several plant species, providing a unique opportunity toexplore the evolutionary process of miRNA-mediated gene reg-ulation systems.

We start modeling the evolution of a miRNA-mediated generegulation system by taking advantage of abundant informationfrom the model species, Arabidopsis thaliana. In this species, anumber of potential miRNAs have been so far identified, e.g. ascataloged in the miRBase database (Kozomara and Griffiths-Jones,2011). Comparative genomics revealed that some of them areshared by monocots and dicots, or even by bryophytes (Floyd andBowman, 2004; Axtell and Bartel, 2005; Axtell and Bowman,2008), indicating their very ancient origins. For such ancientmiRNAs, extensive duplication of miRNA genes and their targetgenes has been demonstrated (Tanzer and Stadler, 2004; Jianget al., 2006; Maher et al., 2006; Takuno and Innan, 2008), creatinga complex regulatory system with multiple interactions. Forexample, according to the ASRP database (Backman et al., 2008),the miR398 system consists of three miRNA genes located onchromosomes 2 and 5 that encode for three distinct miRNAs(miR398a,miR398b and miR398c) (illustrated in Fig. 1a). It is

Fig. 1. Examples of miRNA regulation systems in A. thaliana. (a) miR398 system with three miRNA genes and two target gene families. (b) miR393 system with two miRNAgenes and two target gene families. In the left panel, the genomic locations of mRNA genes and target genes are shown in black and gray boxes, respectively. The rightpanel illustrates the interaction between miRNAs and target genes. The thickness of lines between them represents the strength of interaction presumed by the number ofbase-mismatches between miRNA and its binding site. Mismatches are shown in gray.

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–65 55

Page 3: Journal of Theoretical Biology - SOKEN

obvious that the three distinct miRNAs (or miRNA genes) werecreated by duplications, thereby sharing the same seed regionoriginated from a common ancestry. Therefore, we can considerthese three as a single ‘‘kind’’ of miRNA, as named miR398 forthis case. We assume that a duplication of a miRNA gene meansthat a duplicated copy includes its own cis-regulatory elementsso that it can produce miRNAs by itself. Immediately after theduplication event, the duplicated copy should have the sameexpression pattern and the same seed sequence, but they willchange over time. We can predict that the expression patterncan change relatively quickly, while the seed sequence wouldallow only minor basechanges to keep itself to encode the samekind of miRNA. For the case of miR398, all three miRNA genes(miR398a, miR398b and miR398c) are considered to regulatethe CytC oxidase gene family (one gene) and CSD gene family(one gene), but the minor basechanges should change theaffinity to the biding sites in the target genes, thereby affectingthe regulation efficacy. It should be straightforward to predictthat the efficacy may be maximized when the two sequencesmatch perfectly, and that some mismatch would lower theefficacy (see Figure 7–112 in ‘‘Molecular Biology of the Cell’’,Alberts et al., 2008). The ‘‘presumable’’ efficacy for all sixpossible regulation paths predicted by this idea are presentedby arrows with different thicknesses in Fig. 1. Thus, theregulatory system is quite complicated when there are multiplemiRNA genes involved. Similarly, the miR393 system consists oftwo miRNA genes on chromosomes 2 and 3 that encode formiR393a and miR393b, respectively (illustrated in Fig. 1b). Thetarget gene families are the F-box protein gene family (fourgenes) and the bHLH protein gene family (one gene). In thiscase, we can predict at least three gene duplications occurred inthe F-box protein gene family after the establishment of theinitial interaction with miR393.

To understand how a miRNA system evolves, we here focus onthe number of miRNA genes in a genome and the number ofdifferent gene families regulated by the miRNAs (Takuno andInnan, 2008). The former is particularly important because thisregulation system is uni-directional (except for miR162 thatregulates ARGONAUTE protein, a major factor of miRNA–proteincomplex, Xie et al., 2003), so that it is straightforward to imaginethat the number of miRNA genes is the major factor to determinethe complexity of the regulation system. It is known that differentmiRNA genes have specific expression patterns (Maher et al.,2006), so that with multiple miRNA genes, a complicated reg-ulatory system can exist. The latter (the number of target genefamilies) is important because it should reflect the flexibility ofthe regulation system. For the two examples in Fig. 1 (the miR398and miR393 systems), we can consider that they are flexibleenough to regulate two kinds of functional gene families.

From the view of evolutionary genetics, we can assume thatany miRNA system originated from a pair involving a singlemiRNA gene and a single target gene (Takuno and Innan, 2008).Then, duplications of the miRNA gene follow, thereby creating acomplicated system with multiple interactions. In the meantime,it occasionally happens that an independent gene joins thesystem by acquiring a binding site by mutations. In this study,we investigate how these two processes affect the evolutionarygrowth of a miRNA system. It should be noted that target genesalso duplicate and form a multigene family as shown in Fig. 1b.However, we did not consider their duplications importantbecause those duplicates usually keep similar functions. Basedon these predictions, we develop a mathematical model to tracethe evolutionary growth of a miRNA system, in which variouseffects of duplications of miRNA are incorporated. The model isapplied to the data of Arabidopsis, and the evolutionary forcesbehind the growth of miRNA systems are discussed.

2. Model

2.1. Overview

We develop a theoretical model of the evolutionary growth ofa miRNA system by focusing on the two-dimensional transientchanges of the numbers of miRNA genes and their target genefamilies, denoted by x and y, respectively. It is reasonable tosuppose that any system should originate from an initial estab-lishment of the interaction between a single miRNA gene (x¼1)and a single target gene (y¼1). x increases by duplication ofmiRNA genes, and y increases by a mutation that creates a newinteraction by producing a potential binding site. In this work, weinvestigate how a miRNA system evolves through the change ofðx,yÞ. Table 1 summarizes all symbols used in our modeling.

2.1.1. Definition of the evolutionary stateIn this work, we are interested in the transient process of ðx,yÞ.

The definition of x is simple; it is the number of genomic regionsthat code for the focal miRNA. The definition of y is slightly morecomplicated because, for example, when y¼2, there are manydifferent situations depending on what kinds of functional genefamilies are targeted by the miRNA. We here need to define thefunctions in the cell. Suppose that there are nþ1 possiblefunctions required to maintain the cell (Fig. 2a). n can beconsidered as a very large number, which is treated as a constantthrough the evolutionary process. First, consider the initial stateðx,yÞ ¼ ð1,1Þ, where an interaction between one miRNA and onetarget gene is established. It is assumed that this target gene hasthe th function in all nþ1 functions. Then, it would be possibleto sort all functions in terms of the distance from the thfunction (Fig. 2b), and the vector of the functional distances isdenoted by d¼ ðd0,d1, . . . ,di, . . . ,dnÞ, where di is the functionaldistance from the th function (i.e. diodiþ1). The first column isfor the original target gene with the th function, such thatd0 ¼ 0. This vector is also treated as constant through the evolu-tionary process once the initial interaction is established.

Given this vector, we consider the evolutionary change of the‘‘on’’ and ‘‘off’’ states of the regulation of the focal miRNA, whichis presented by vector L¼ ðl0,l1, . . . ,li, . . . ,lnÞ. Although li should bea quantitative parameter considering the biological mechanism ofmiRNA regulation, for convenience, we assume li¼1 if it is underregulation and li¼0 if not. Once L is given, y is uniquelydetermined as

Pni ¼ 0 li ¼ y, but there are many different L for a

single value of y. For example, even for a simple case with y¼2,possible states for L are L¼ ð1,1,0,0,0, . . .Þ,ð1,0,1,0,0, . . .Þ,ð1,0,0,1,0, . . .Þ, and so on. In a general setting, given y, there areM¼ nCy%1 possible L, and they are denoted by LðmÞ

y , wherem¼ f1,2,3, . . . ,Mg. Thus, the state of target genes is betterdescribed by L, and y is considered to be a conventional variable,which will be used only when we summarize the results.

2.1.2. State changes by fixation of a mutationIn our modeling of the evolution of a miRNA regulation

system, we set the initial state where a new interaction isestablished by a single miRNA gene and a single target gene (i.e.x¼1 and y¼1). Then, the system starts evolving by increasing anddecreasing the copy numbers of miRNA genes (Fig. 3c, e) andwhich target gene families are under their control (Fig. 3a, b, f, g).Fig. 3 illustrates all possible events that change the state.

Duplications of miRNA genes occur randomly in the genome atrate vþ

x per gene, so that when there are x miRNA genes in thegenome, the rate at which x increases to xþ1 is given by xvþ

x . Thedeletion rate per miRNA gene is denoted by v%

x , which is usuallyconsidered to be much higher than the duplication rate (vþ

x ).

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–6556

Page 4: Journal of Theoretical Biology - SOKEN

Thus, the state of x changes by mutations (duplication anddeletion), but it is important to notice that only part of themcan contribute to the long-term evolution, because most muta-tions would be eliminated from the population by selection andgenetic drift. In our model, we define that the state shifts onlywhen a mutation is fixed in the population, therefore, we need toincorporate the fixation probability, u(s), where s is the selectiveadvantage that the mutant confers. According to the basic theoryof population genetics (Crow and Kimura, 1970), u(s) is given by afunction of s:

uðsÞ ¼1%exp½%2s'1%exp½%4Ns'

ð1Þ

in a random mating diploid population with size N. This equationmeans that the fixation probability u(s) of a neutral mutation is1=ð2NÞ and u(s) increases almost linearly when s is large. If s isnegative, u(s) decreases and becomes almost 0 for even moderate

absolute values of s. Then, the rate of such an evolutionary changeof the state is given by 2N times the mutation rate (vþ

x or v%x , for

duplication and deletion, respectively) and its fixation rate givenby Eq. (1). That is, the per-generation rates of the change of x toxþ1 and to x%1 are given by

lþx ¼ 2Nxvþ

x uðsþx Þ ð2Þ

and

l%x ¼ 2Nxv%x uðs

%x Þ, ð3Þ

respectively. sþx and s%x are the selection coefficients of duplica-tion and deletion of the miRNA genes.

The same logic can be applied to the change of y, the numberof target gene families. We assume that a gene comes underregulation of the miRNA when it acquires a binding site by pointmutations (and possibly by indels) at rate vþ

y . v%y is the backward

mutation rate, at which a target gene family becomes unregulated

Table 1List of mathematical symbols.

Basic parametersx Number of miRNA genes in the systemy

Pni ¼ 0 li: Number of target gene families in the system

N Effective population size

Functional spaced (d0 ,d1 , . . . ,di , . . . ,dn): Vector of functional distancesdi Functional distance of the ith function from the original function

LðmÞy

(l0 ,l1 , . . . ,li , . . . ,ln): Vector of regulation state, corresponding to d (li ¼ f0,1g;m¼ f1,2, . . . ,Mg)

M nCy%1: The number of all possible L given y

nþ1 The number of functions required to maintain the cell

Transition rate

lþx ,l%x Transition rates to increase and decrease x in Eqs. (2) and (3)

lþy ,l%y , Transition rates to increase and decrease y in Eqs. (4) and (5)

vþx ,v%

x Duplication and deletion rates of amiRNA gene

vþy ,v%

y Rates of gain and loss of a target gene

u(s) Fixation probability of a mutant in Eq. (1)s Selection coefficient defined by Eq. (7)f Fitness of the current state defined by Eq. (6)

Transition processPðx,LðmÞ

y ÞðtÞ State probability of the miRNA system at time t

PðtÞ Vector including all possible combinations of x and L, arranged according to Pðx,LðmÞy ÞðtÞ

M Transition probability matrix

Fitness function

Fðd9xÞ Fitness landscape defined by Eq. (10)

f nðxÞ Fð09xÞ: Fitness when the original function is under regulation of x miRNA genes

dnðxÞ Functional distance that satisfies Fðd9xÞ ¼ 0

yn maxfdi9Fðdi9xÞ40gþ1: Maximum number of target gene families that is advantageously regulated

xðxÞ,fðxÞ Subfunctions involved in Fðd9xÞ (see Table 2)

h,a,b,g Coefficients that determine the shape of Fðd9xÞ

Function

Function n+1

Function

Function

......

NucleusCytoplasm

Target messenger RNAthat cords function

microRNA

Functionaldistance

Function 0

0

Function 1 Function 3

Function 2 Function n

d1 d2 d3 dn

=

Fig. 2. Illustration to explain the definition of functional distance, d. (a) It is assumed that there are nþ1 functions to maintain the cell, and that a miRNA establishes aninteraction with a gene with th function. (b) The functional distances between the th function and the rest are defined as d, which can be placed on a one-dimensional space.

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–65 57

Page 5: Journal of Theoretical Biology - SOKEN

by losing the binding site. It should be noted that this process isnot very simple when a target gene family consists of multiplegene copies, because all of them have to lose their binding sites.This means that the rate would be small for a large multigenefamily. Nevertheless, for mathematical convenience, this rate isassumed to be constant at rate v%

y . sþy and s%y are the selection

coefficients for the two events. Then, the per-generation rates ofthe change of y to yþ1 and to y%1 are given by

lþy ¼ 2Nvþ

y uðsþy Þ ð4Þ

and

l%y ¼ 2Nv%y uðs

%y Þ, ð5Þ

respectively. It should be noted that there are two ways to changey, point mutations in the miRNA gene (Fig. 3a, f) and in the targetgene (Fig. 3b, g). Here, we did not distinguish them, and vþ

y andv%y are considered to include both events.Note that throughout this work, it is assumed that both x and y

can change by only one in a single step. This assumption shouldwork reasonably well because the rates of duplication and pointmutation are usually low. Even when multiple changes occuroccasionally, the stochastic process could be likely well-approxi-mated by a single-step process.

2.2. Fitness effects of mutations

Fig. 4 illustrates all possible changes from state (x,LðmÞy ). If the

next event is the fixation of a duplication of a miRNA gene, the

state moves to the right, (xþ1,LðmÞy ), and if it is the fixation of a

deletion, the state moves to the left, (x%1,LðmÞy ).

The situation is slightly complicated for y. There are a numberof possibilities for the changes of LðmÞ

y . If the next event is a join ofa new target gene family (i.e. y-yþ1), there are in total nþ1%ypossibilities. For a decrease of y, there are y%1 possibilitiesbecause we assume that the initial interaction with the originalgene family cannot be lost, which should be a reasonableassumption according to the comparative genomic observations(e.g. Axtell and Bowman, 2008). All these possible changes areincorporated in our modeling, and the transient probabilities tothese next states are given conditional on the fitness landscape,

Fig. 3. All possible events that cause a shift of (x,y). An example is shown when the current state is ðx,yÞ ¼ ð2,2Þ. Changes to increase and decrease the number of target genefamilies are presented by black arrows, and white arrows are for those to increase and decrease the number of miRNA genes. Gray stars are mutations that cause losses ofinteraction (represented by gray ( ’s).

Fig. 4. Transition diagram showing possible pathways from the current state(x,LðmÞ

y ). The transition probabilities are also shown.

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–6558

Page 6: Journal of Theoretical Biology - SOKEN

which exhibits how advantageous or deleterious the nextevents are.

In the previous section, we showed that the selection coeffi-cients (sþx ,sþy ,s%x and s%y ) play the major role to determine thetransition probabilities. The selection coefficient of a mutation isdefined as the difference in fitness before and after the mutation,which largely depends on the fitness landscapes in the currentstate (x,LðmÞ

y ) and in all possible states, to which the system canmove by a single step. We explain how the selection coefficient isdetermined with an illustration of an example, where the currentstate is (3,ð1,0,1,1,0,1,0,0, . . .Þ), i.e. four different gene families(y¼4) are regulated by three miRNA genes (x¼3).

Examples of the fitness landscape are illustrated in Fig. 5 as afunction of d. Technically, the fitness landscape defines the fitnesseffect of each of nþ1 functions when regulated by the miRNA.The landscape can change when the number of miRNA geneschanges, so we assume that the fitness landscape is givenconditional on x, denoted by Fðd9xÞ. Given Fðd9xÞ, the fitness ofthe current state (denoted by f) can be defined by assumingadditivity of the selective effect:

f ¼Xn

i ¼ 0

Fðd¼ di9xÞli: ð6Þ

For the example of the current state (3,ð1,0,1,1,0,1,0,0, . . .Þ)(Fig. 5e), the fitness is given by f ¼ Fðd093ÞþFðd293ÞþFðd393ÞþFðd593Þ. Then, suppose a mutation occurs, which provides apotential shift of the state if it is fixed in the species. The selectioncoefficient, s, is the major factor to determine the likelihood offixation. Let f 0 be the fitness after the mutation, the selectioncoefficient is simply defined as

s¼ f 0%f : ð7Þ

Let us next consider how s is given when a new gene comesunder regulation. Suppose that a gene with the 4th functionacquires a binding site (denoted by the filled star in Fig. 5b), thenthe state could move to (x¼ 3,Ly ¼ 4 ¼ ð1,0,1,1,1,1,0,0, . . .Þ), in

which the fitness is given by f 0 ¼ Fðd093ÞþFðd293ÞþFðd393ÞþFðd493ÞþFðd593Þ. Therefore, this state change has a fitness advan-tage sþy ¼ f 0%f ¼ Fðd493Þ over the current state, which can bedirectly plugged in Eq. (4) to obtain the transient probability,lþy . According to the illustration in Fig. 5b, this change is

selectively advantageous because Fðd493Þ40.In the other direction to decrease y to three, we again follow

the same fitness function Fðd9xÞ. Fig. 5h illustrates a case wheregenes with the second function escape from the regulation(denoted by the filled star in Fig. 5h), so that the fitness is givenby f 0 ¼ Fðd093ÞþFðd393ÞþFðd593Þ. Therefore, this state change hasfitness advantage s%y ¼ f 0%f ¼%Fðd293Þ over the current state,from which we have the transient probability l%y by Eq. (5).

There is only one way in each direction for the change of x. Inboth cases, Fðd9xÞ changes. If x increases to four, the fitnesslandscape follows Fðd94Þ, and the fitness in state (4,ð1,0,1,1,0,1,0,0, . . .Þ) is given by f 0 ¼ Fðd094ÞþFðd294Þþ Fðd394ÞþFðd594Þ,from which sþx can be obtained. The change of Fðd9xÞ depends onwhat we predict under a certain biological scenario as will bedescribed in the next section, and Fig. 5 shows three differentpatterns (Fig. 5c, f, i). Similarly, s%x is given when x decreases to 2(Fig. 5a, d, g), in which the fitness is given by f 0 ¼ Fðd092ÞþFðd292ÞþFðd392ÞþFðd592Þ.

s for all possible changes from a certain state (x,LðmÞy ) can be

computed from Eq. (7) once the vector of functional distance andthe fitness landscapes for all possible x are given. Then, it isstraightforward to trace the evolution of the system from theinitial state, ð1,1Þ ¼ ð1,ð1,0,0,0,0,0,0, . . .ÞÞ. Let t be the time (thenumber of generations) since the initial state. Pðx,LðmÞ

y ÞðtÞ is theprobability that the state is ðx,LðmÞ

y Þ at time t, and P ðtÞ be thevector that includes all possible states. Then, the transient processof the system can be written as

Pðtþ1Þ ¼MPðtÞ, ð8Þ

where M is the transient probability matrix that can be obtainedfrom Eqs. (2)–(5) and (7). Using this equation, we explore the

Fig. 5. Illustration of transition processes under the three scenarios, the positive and negative dosage and neofunctionalization scenarios. An example with the currentstate (x¼ 3,Ly ¼ 4 ¼ ð1,0,1,1,0,1,0,0, . . .Þ) is shown. The changes of y are illustrated in (b) and (h). Open circles represent functions that are not regulated (li¼0), and filledcircles are regulated ones (li¼1). Filled stars indicate the changes in the regulated state (l4 ¼ 0-1 in (b), l2 ¼ 1-0 in (h)). (a), (d) and (g) are for changes x¼ 3-2 and (c),(f) and (i) are for changes x¼ 3-4. Different fitness landscapes are given for each scenario, in which the increase and decrease of x changes the fitness landscape.

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–65 59

Page 7: Journal of Theoretical Biology - SOKEN

evolutionary behavior of the system under various conditions ofthe fitness landscape. In the following section, for convenience toshow the results, the vector to represent the presence or absenceof gene families under regulation is summarized by y, the numberof regulated gene families, which is given by

Pðx,yÞðtÞ ¼X

m

Pðx,LðmÞy ÞðtÞ: ð9Þ

3. Results

We use Eq. (9) to investigate the evolutionary behavior of (x,y)under various scenarios, which can be specified by the function torepresent the fitness landscape conditional on the number ofmiRNA genes, Fðd9xÞ. We assume that the fitness landscape isgiven by

Fðd9xÞ ¼ xðxÞðexp½%ðfðxÞdÞ2'%hÞ, ð10Þ

which is flexible enough to describe typical landscapes of fitnessunder commonly considered evolutionary scenarios of geneduplication (reviewed in Innan and Kondrashov, 2010). Thisformula produces a symmetrical bell-shaped distribution with asingle peak at d¼0. We denote the peak as f nðxÞ ð ¼ Fð09xÞÞ whichrepresents the fitness effect when the original function is underthe regulation of the miRNA. f nðxÞ increases with increasing x ifdxðxÞ=dx40, and xðxÞ specifies how much f nðxÞ increases withincreasing x. As d increases, Fðd9xÞ decreases and goes across thehorizontal axis at Fðd9xÞ ¼ 0. h is used to determine the absolutelevel of selection intensity. Let us define dnðxÞ such that Fðd9xÞ ¼ 0,then from Eq. (10), dnðxÞ is given by

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilogð1=hÞ

p=fðxÞ (dnðxÞ can

exist when 0oho1). This means that fðxÞ specifies how dnðxÞ isgiven with x, and that dnðxÞ is a decreasing function of x ifdfðxÞ=dx40 and is an increasing function if dfðxÞ=dxo0. Thus,Eq. (10) is suitable to characterize the shape of the fitnessfunction by adjusting f nðxÞ and dnðxÞ. An advantage is that theparameters that determine f nðxÞ are not involved in dnðxÞ, andthose that determine dnðxÞ are independent of f nðxÞ. Therefore,f nðxÞ and dnðxÞ can be independently set with no confoundingeffects between them.

We set the fitness function (Eq. (10)) to be consistent with thecommonly considered scenarios of the effect of gene duplication:the positive and negative dosage, and neofunctionalization sce-narios, and explore the behavior of (x,y) with biologically reason-able parameter sets, as described below. In all scenarios, thedistribution of functional distances is conventionally assumedsuch that d¼ ðd0,d1,d2, . . . ,di, . . . ,dnÞ ¼ ð0,1,2, . . . ,i, . . . ,nÞ. We alsointroduce the maximum number of target gene families that isadvantageously regulated, yn, given by

yn ¼maxfdi9Fðdi9xÞ40gþ1, ð11Þ

where the second term (i.e. þ1) in the right-hand side is for theoriginal function located at d¼0. Thus, yn is uniquely determinedonce the x-intercept dnðxÞ is given. We can consider that yn

somehow provides the ‘‘upper boundary’’ of y. For convenience,we set parameters such that the original interaction has 1% ofpositive effect (i.e. xð1Þ ¼ x0 ¼ 0:02 and h¼0.5 so thatf nð1Þ ¼ xð1Þð1%hÞ ¼ 0:01). We also assume such that dnð1ÞC5:5(i.e. fð1Þ ¼f0 ¼ 0:15 and h¼0.5 so that dnð1Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilogð1=hÞ

p=

fð1ÞC5:5), indicating that, for the first six functions (i¼ 0) 5),there is a positive effect when regulated by the miRNA (yn ¼ 6),slightly deleterious for the 6th function (Fð69xÞC%0:001, whenx¼1), and quite deleterious for the rest with iZ7. Fig. 6a, c, and eshow how Fðd9xÞ changes under the three scenarios considered inthis work (see below). These settings are fixed through this work(see also Table 2).

We consider that the forward mutation rates to increase x andy (vþ

x and vþy , respectively) are on the order of 10%7210%5, and

show results for two values for vþx (5:0( 10%7, 1:0( 10%7) and

one for vþy (5:0( 10%7). In all cases, it is assumed that the

backward mutation rates (v%x and v%

y ) are 100 times larger thanthe forward mutation rates i.e. v%

x ¼ 100( vþx and v%

y ¼ 100( vþy .

We assume the population size N¼1000. These parameters wereassumed to demonstrate the general evolutionary behavior of(x,y). It should be noted that although these parameters wereseparately specified, as well as other population genetic theories,the important parameter is the product of the population size andmutation rate.

3.1. Typical behavior of (x,y) under commonly assumed scenarios ofgene duplication

In this section, we explore the behavior of (x,y) under com-monly assumed scenarios of gene duplication, that is, the positiveand negative dosage, and neofunctionalization scenarios. For eachof the three scenarios, we apply certain forms of xðxÞ and fðxÞ inEq. (10) to be consistent with the ideas of each of the threescenarios (see Table 2 for a summary).

3.1.1. Positive dosage scenarioThe immediate impact of gene duplication is to increase the

dosage, which would be either advantageous or deleteriousdepending on the situation. Here, in the positive dosage scenario,we assume that duplications of miRNA genes enhance theadvantageous effects of the miRNA–target gene regulation whenit is selectively positive (Fig. 5c), and vise versa (Fig. 5a). Forexample, for the original function that should have an advanta-geous interaction, f nðxÞ increases with increasing x. The sameapplies to other functions with dodnðxÞ.

To make a realization of such a situation with Eq. (10), we setxðxÞ and fðxÞ in Eq. (10) such that f nðxÞ linearly increases withincreasing x, while the value of dnðxÞ is fixed. In practice, xðxÞ isdefined as a linear function of x, xðxÞ ¼ x0þaPDðx%1Þ, where aPD isthe coefficient of x that determines how much xðxÞ increases, andx0 is assumed to be 0.02. We also assume fðxÞ ¼f0 ¼ 0:15 formathematical convenience; this assumption provides a situationwhere dnðxÞ is independent of x (therefore, dnðxÞ is treated as aconstant dn for this scenario). Here, we set aPD ¼ 0:006, and Fðd9xÞfor various x are shown as functions of d in Fig. 6a. Under thisscenario, Fðd9xÞ increases linearly with x as long as dodn,otherwise Fðd9xÞ decreases linearly with x. Under this specificparameter setting, yn ¼ 6 from Eq. (11).

Fig. 6b shows the changes of Pðx,yÞðtÞwith time when the fitnessfunction is given as in Fig. 6a. The upper panels are for the higherduplication rate vþ

x ¼ 5:0( 10%7 and the lower panels are forvþx ¼ 1:0( 10%7. Initially, the state evolves with increasing both x

and y. While x keeps increasing, y saturates once the stateapproaches y¼6 or (1, 1, 1, 1, 1, 1, 0, 0, 0, y). Finally, the peakof the probability distribution Pðx,yÞðtÞ would transit to the stateðx,yÞ ¼ ð1,6Þ. It should be noted that y¼6 corresponds to thenumber of gene families that can be positively regulated by themiRNA, that is, yn ¼ 6. It occasionally happens that a gene familythat has a negative selective coefficient comes under regulation,but such a gene family should be eliminated from the system bylosing the binding site.

The results suggest that the most important parameter underthis scenario would be yn, which provides the upper boundary fory (see Eq. (11). Eventually, while x keeps increasing, there wouldbe a stationary distribution of y, which is given by the selection–mutational balance. The roles other parameters play may beminor in terms of determining the direction of evolution. As

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–6560

Page 8: Journal of Theoretical Biology - SOKEN

Table 2Summary of the definitions and outcomes under the four models.

Scenarios xðxÞ fðxÞ f nðxÞ dnðxÞ Local peak of limt-1Pðx,yÞ

Positive dosage x0þaPDðx%1Þ f0 ð1%hÞðx0þaPDðx%1ÞÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilogð1=hÞ

p

f0

ð1,rynÞ

Negative dosage x0%aNDðx%1Þ f0 ð1%hÞðx0%aNDðx%1ÞÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilogð1=hÞ

p

f0

ð1,rynÞ

Neofunctionalization x0 f0

bNF ðx%1Þþ1ð1%hÞx0 ðbNF ðx%1Þþ1Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilogð1=hÞ

p

f0

ð1,rnþ1Þ

Complexity growth x0þaCGðx%1Þ f0þgCGðx%1Þ ð1%hÞðx0þaCGðx%1ÞÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilogð1=hÞ

p

f0þgCGðx%1Þð1,1Þ, ð1,rynÞ, and intermediate states between the two (see textfor details)

Fig. 6. Probability distribution Pðx,yÞðtÞ under the three scenarios: (a), (b) the positive dosage, (c), (d) negative dosage and (e), (f) neofunctionalization scenarios. In eachscenario, results for two parameter sets are shown (vþ

x ¼ 5:0( 10%7 ,vþy ¼ 1:0( 10%7 ,v%

x ¼ 5:0( 10%5, and v%y ¼ 1:0( 10%5 in the upper panels and vþ

x ¼ vþy ¼ 1:0(

10%7 ,v%x ¼ v%

y ¼ 1:0( 10%5 in the lower panels). The timeframe to show the results for each parameter set is optimized to demonstrate the point.

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–65 61

Page 9: Journal of Theoretical Biology - SOKEN

shown in Fig. 6b, the difference in the mutation rates mainlyaffects the transient speed, but the effect should be weak on theeventual state, which was verified by further simulations withother mutation rates (data not shown). The major effect ofpopulation size and selection parameters would also be on thetransient speed but not much on the eventual state. This generalconclusion holds when more complicated assumptions on thefitness function are made. For example, when there is a non-linearrelationship between x and f nðxÞ, and even when there would bean optimal dosage of miRNA, x keeps increasing as long asdf nðxÞ=dx40, and it does not affect the upper limit of y (i.e. yn).

3.1.2. Negative dosage scenarioThe negative dosage scenario considers the dosage effect of

gene duplication in the other direction; the advantageous effectsof the miRNA–target gene regulation are weakened by duplica-tions of miRNA genes (Fig. 5d, f). In contrast to the positive dosagescenario, we let f nðxÞ decreases with increasing x, while the valueof dnðxÞ is fixed. In practice, xðxÞ is defined as a linear function of x,xðxÞ ¼ x0%aNDðx%1Þ, where aND is the coefficient of x that deter-mines how much xðxÞ decreases. Again, it is assumedfðxÞ ¼f0 ¼ 0:15 so that dnðxÞ is independent of x (therefore,dnðxÞ is treated as a constant dn for this scenario). We assumex0 ¼ 0:02, aND ¼ 0:002, f0 ¼ 0:15, and Fðd9xÞ for various x areshown as functions of d in Fig. 6c.

Fig. 6d shows the changes of Pðx,yÞðtÞ along time when thefitness function is given as in Fig. 6c. The observed evolutionarydirection is to increase y while x cannot increase becauseduplication of miRNA genes is always deleterious. Then, y stopsincreasing because of its upper boundary (i.e. yn). As well as thepositive dosage scenario, the number of gene families that arepositively regulated is six under this parameter setting, so that itwill be stuck at y¼6. Thus, it can be concluded that the eventualstate of Pðx,yÞðtÞ would be given when x¼1 and y should have astationary distribution with the upper limit at yn (although ycould occasionally exceed yn by chance).

This conclusion is robust to the mutation rate, population sizeand selection parameters by the same reasons as that for thepositive dosage scenario; these parameters mainly affect thetransient speed. The conclusion holds as long as f nðxÞ is given bya monotonically decreasing function of x.

3.1.3. Neofunctionalization scenarioNeofunctionalization is a scenario, under which gene duplica-

tions are considered to increase the potential for novel functions.For duplications of miRNA genes, it would correspond to thecapacity to establish meaningful interactions with new genefunctions (Fig. 5g, i). We model this scenario of duplication suchthat dnðxÞ is increased linearly with increasing x. We assume thatf nðxÞ does not change by gene duplication, i.e. duplication isneutral for the original regulation (therefore, f nðxÞ is treated as aconstant fn for this scenario). Accordingly, fðxÞ is defined as aninverse function of x, fðxÞ ¼f0=ðbNF ðx%1Þþ1Þ, where bNF is thecoefficient of x that determines the extent of decrease of fðxÞ.Here, we set bNF ¼ 10, x0 ¼ 0:02 and f0 ¼ 0:15, and Fðd9xÞ forvarious x are shown as functions of d in Fig. 6e.

Fig. 6f shows the changes of Pðx,yÞðtÞ along time when thefitness function is given as in Fig. 6e. Under this scenario, becauseduplication always has a positive effect, x should increase. As xincreases, dnðxÞ (and yn) increases, which allows y to increase.With our parameter setting, the initial move of the state is that yincreases to approach the upper boundary, and then both x and yincrease gradually (Fig. 6f). The forward mutation rate affects thetransient speed. Thus, under the neofunctionalization scenario,both x and y increase; this conclusion should hold as long as dnðxÞis given by a monotonically increasing function of x.

3.1.4. Summary of the move of Pðx,yÞðtÞ under the three commonmodels

Through the overview of the three models, the behavior of (x,y)is quite obvious and predictable from the parameter setting. Forexample, if we assume parameters such that yn ¼ 6 holds in thepositive dosage model, then y can increase up to 6 and saturates.Thus, we can understand that our models are so simple that (x,y)behaves as we intuitively predict from the parameter setting.

All three simple models assumed that either dnðxÞ or f nðxÞ isfixed and the other is given by a function of x, the number ofmiRNA genes. The consensus evolutionary pattern under thethree models seems to be that there is a unidirectional move ofPðx,yÞðtÞ. In other words, there is a single peak of Pðx,yÞðtÞ whent-1 as summarized in Table 2.

This conclusion should be robust to the population size,mutation rates and selection parameters (unless Ns is very small).All these parameters are involved to determine the variance ofPðx,yÞðtÞ, the transition pathway and the transient speed. Thevariance of Pðx,yÞðtÞ is mostly affected by the population size N,which determines the extent of genetic drift; when N is small, asdeleterious mutations can fix by chance, producing a broaderdistribution of Pðx,yÞðtÞ. The transition pathway and speed areobviously affected by the mutation rates, population size andthe fitness landscape, because they are the major factors involvedto determine the substitution rates (Eqs. (2)–(5)).

3.2. Application to the Arabidopsis data

In the previous section, typical behaviors of (x,y) are consid-ered under the three commonly assumed scenarios of geneduplication. The consensus is that in all scenarios where eitherf nðxÞ or dnðxÞ is fixed and the other is given by a function of x, thereis a unidirectional move of (x,y) although the directions aredifferent. These theoretical results are not consistent with theobservation in Arabidopsis. Fig. 7 shows the observed distributionof (x,y) (modified from Takuno and Innan, 2008), indicating thatboth x and y seem to stay with relatively small numbers for a verylong time. It should be noted that this is not simply because thesemiRNA systems are young: the figure shows only very ancientones that shared by both Arabidopsis thaliana and Oryza sativa(diverged 125–150 million years ago, Takuno and Innan, 2008).For such ancient systems, both x and y are very unlikely to staysmall under the three unidirectional hypotheses. It is suggestedthat a more complex model will be needed to understand theobservation in the Arabidopsis data.

Fig. 7. Scatter plot of the numbers of miRNA genes and target gene families.Modified from Takuno and Innan (2008). The number of miRNA systems with thesame state of (x,y) is represented by the shape of symbol.

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–6562

Page 10: Journal of Theoretical Biology - SOKEN

3.3. Complexity growth scenario

To explain the observed pattern of miRNA genes, it seems to benecessary to incorporate evolutionary forces in at least twodirections. Such a model was verbally argued in Takuno andInnan (2008). The logic behind their model stemmed from thebiological nature of the miRNA gene regulation process, as brieflysummarized in the following. A duplication of a miRNA genecreates two identical copies in the genome. They would havesimilar expression patterns at the beginning, and the immediateeffect would be related to dosage. However, it is known that thechanges of the expression pattern of a coding gene is much fasterthan its functional change (e.g. Odom et al., 2007; Bornemanet al., 2007; Tischler et al., 2008), so that a new miRNA gene likelyobtains a unique temporal and spacial expression pattern. Apossible advantage of duplication of miRNA genes would be thata more complex and sophisticated design for the existing targetgenes can be established with multiple miRNA genes than asimple regulation system with a single miRNA gene. The idea issimilar to subfunctionalization of duplicated genes followed byescape from adaptive conflict (Hughes, 1994; Des Marais andRausher, 2008; Rueffler et al., 2012). Thus, we can consider thatthe number of miRNA genes affects how well the miRNA regula-tion system can be optimized to a certain function by makingitself a complicated and sophisticated system. In other words, thenumber of miRNA genes would determine the potential ‘‘com-plexity’’ of the regulation system.

On the other hand, duplications of miRNA genes should not beadvantageous for most of the other functions, once the system hasalready been optimized to the existing target gene(s). This can beeasily imagined because a system specialized to one particularfunction should be difficult to be used for other gene families.Thus, the disadvantage of increasing the number of miRNA geneswould be to narrow down the potential to have positive regula-tory interactions with other functional gene families.

To model this scenario with bidirectional fitness effects ofmiRNA gene duplication, we assume xðxÞ and fðxÞ in Eq. (10), suchthat f nðxÞ linearly increases with increasing x, while dnðxÞdecreases. That is, in Eq. (10), xðxÞ and fðxÞ are defined asx0þaCGðx%1Þ and f0þgCGðx%1Þ. aCG and gCG are the coefficientsof x which determine the increment of xðxÞ and fðxÞ, respectively(see also Table 2). This model is referred to as the complexitygrowth model.

Two examples of the fitness landscape under the complexitygrowth model are shown in Fig. 8a and c. aCG ¼ 0:025 and 0.01 areassumed in Fig. 8a and c, respectively, while gCG ¼ 0:15 is fixed.For each of the assumed fitness landscape, the transient process ofPðx,yÞðtÞ is shown with two values of the mutation rate (Fig. 8b,d).The four parameter sets provide various transient patterns ofPðx,yÞðtÞ. Generally, both x and y first increase, and then multiplepeaks are formed as t increases. These patterns are very differentfrom those under the three scenarios that exhibit unidirectionaltransition of Pðx,yÞðtÞ. The existence of these local peaks indicatesthat there are multiple fitness peaks, at which (x,y) stays for asubstantially long time.

The locations of local peaks depend on the parameter sets. Inthe upper panel of Fig. 8b (vþ

x ¼ 5:0( 10%7), there are two majorpeaks at ðx,yÞ ¼ ð5,2Þ and ðZ8,1Þ, while in the lower panel(vþ

x ¼ 1:0( 10%7), two major peaks locate at ðx,yÞ ¼ ð5,2Þ andð1,6Þ. In Fig. 8d, we observe several peaks: ðx,yÞ ¼ ðZ8,1Þ, ð4,2Þ,ð2,3Þ, and ð1,6Þ in the upper panel (vþ

x ¼ 5:0( 10%7), andðx,yÞ ¼ ð4,2Þ, ð2,3Þ, and ð1,6Þ in the lower panel (vþ

x ¼ 1:0( 10%7).To understand the general behavior of Pðx,yÞðtÞ, we first focus on

the effect of the selection parameters, which determines thelocations of local peaks. The relationship between the locationof local peaks and the selection parameters is relatively simple;dnðxÞ (or yn) plays the major role as long as dnðxÞ is given by amonotonically decreasing function of x. As mentioned earlier, yn

provides the theoretical upper limit of y conditional on x; there-fore, local peaks potentially appear at ðx,yn

9xÞ.

Fig. 8. Probability distribution Pðx,yÞðtÞ under the complex growth scenario. Results for two values of aCG are shown (aCG ¼ 0:1 in (a) and (b); aCG ¼ 0:05 in (c) and (d)). Otherparameters are identical to those used for Fig. 6.

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–65 63

Page 11: Journal of Theoretical Biology - SOKEN

The mutation rates (vþx and vþ

y ) and the selection parameters(aCG and gCG) determine the primary direction in which the statemoves. vþ

x and aCG determine the extent of the move in thedirection to increase x, while vþ

y and gCG determine the extent inthe other direction to increase y. As a consequence, if the relativecontribution of the former two parameters dominates the lattertwo, it is likely that the transient process of Pðx,yÞðtÞ is skewed inthe direction to increase x, and vise versa. These four parametersalso affect the transient speed of Pðx,yÞðtÞ.

This complexity growth model obviously explains the observa-tion in the Arabidopsis data (Fig. 7) much better than the threebasic models with unidirectional evolutionary pressure. Theobserved distribution of (x,y) indicates that there are at least twomajor types of miRNA systems. One is those with large x and y¼1,and the other is those with x¼1 and large y. The major differencebetween the complexity growth model and the other three is thatthe only former can explain the co-existence of both types. Asdemonstrated in Fig. 8, under the complexity growth model, it ispossible to predict a distribution of (x,y) where each of the twotypes constitutes a reasonable proportion. On the other hand,under any of the other three models, it is very difficult to find aparameter set to produce such a distribution. Under the negativedosage model, selection does not allow increase of x becauseduplication is always deleterious (Fig. 6d). Under the positivedosage model, it is easy to imagine that x simply increases asduplication is always positively selected (upper panels in Fig. 6b). Ifthe mutational pressure to increase y is large, y can also increasebut x eventually starts increasing (lower panels in Fig. 6b). In eithercase, the two types cannot co-exist. Furthermore, this model wouldpredict some proportion of miRNA systems that have relativelylarge x and y, which are not observed in the Arabidopsis data(Fig. 7). Under the neofunctionalization model, as shown in Fig. 6f,both x and y increase simultaneously, so that some proportion ofmiRNA systems with large x and y are predicted, similar to thepositive dosage model. Through the increase of x and y, the path of(x,y) might likely go through states with large x and small y if thepressure to increase x dominates that of y, and vise versa. However,this does not explain the co-existence of the two major types.

4. Discussion

We have developed a flexible model to explore how a generegulation system evolves through gene duplication. Our modeling isbased on extensive knowledge of the molecular biology of plantmiRNAs. A miRNA system consists of miRNAs and their target genes.It is evolutionarily obvious that a system originates from the estab-lishment of an interaction between a single miRNA (coded by a singlemiRNA gene) and a single target gene, and that its subsequentevolution heavily relies on duplications of the miRNA genes, whichshould determine the regulation pattern of their target genes. Inaddition, the model incorporates a new functional gene family joiningthe existing system by acquiring a miRNA binding site. Duplicationsof target genes are also common, but our model ignored thembecause those duplicated target genes usually maintain a similarfunction. Rather, we were interested in howmany kinds of functionalgene families are regulated by a particular miRNA. Therefore, wefocused on the number of miRNA genes (denoted by x) and thenumber of distinct functional gene families (denoted by y), and theevolutionary behavior of (x,y) were theoretically explored.

We first considered two simple models, in which duplicationof miRNA genes affect only its dosage, either positively ornegatively (the positive and negative dosage models). The evolu-tionary behavior is quite simple in these two models. In thepositive dosage model, (x,y) shifts towards ð1,ynÞ. This meansthat because increasing dosage of miRNA is always advantageous,

x keeps increasing, whereas yn provides the upper limit of y toincrease. On the other hand, in the negative dosage model,because duplications of miRNA genes are deleterious, x cannotincrease and likely drifts around x¼1. Similar to the positivedosage model, yn provides the upper limit of y to increase.

The third model we considered is the neofunctionalizationmodel, in which duplications of miRNA genes increase thenumber of functional gene families that are positively regulated.In this model, because yn increases with increasing x, both x and ysimultaneously increase.

Thus, in these three simple models, the transient process of(x,y) seems unidirectional although the direction is different. It isobvious that either of the three models cannot explain theobservation of Arabidopsis miRNA systems (Fig. 7). To account forthis observation, we modified the model such that both positiveand negative effects of duplications of miRNA genes are incorpo-rated simultaneously. In practice, it is assumed that as x increases,f nðxÞ increases while dnðxÞ (or yn) decreases. Biologically, thismeans that duplication of miRNA genes can increase the fitnessof the original target gene family because the regulation systemcan be tuned up by using multiple miRNA genes. This tuning-upshould have a cost; if the system is optimized to a particularfunction, it would be difficult for other gene families to use it.Therefore, our model assumes that duplications of miRNA genesdecrease the number of gene families that are positively regulated(i.e. yn). As mentioned above, this model is well consistent withthe biological nature of miRNA regulation systems. We found thatthis model can explain the Arabidopsis data reasonably well.

Our demonstration suggests that the complexity growthmodel could explain the observation of Arabidopsis miRNA sys-tems (Fig. 7) much better than the other three models. The resultis considered to indicate that there would be at least two differentdirections for the evolution of miRNA systems, rather than thatour complexity growth scenario is an exclusive explanation of theobservation, because the real situation should be much morecomplicated. Our model has a number of oversimplified assump-tions, for example, equal intensity of selection and equal mutationrates for all miRNA systems, simplified vectors for functionaldistance (d) and regulation states (L), and also simplified func-tions for fitness landscape (Fðd9xÞ). Comprehensive theoreticalanalyses relaxing these assumptions would help to further under-stand the evolutionary behavior of miRNA systems. Nevertheless,this work provides a basic theoretical framework to study theevolution of regulation systems. The model can be flexibly appliedto other gene regulation systems, e.g. regulation systems bytranscriptional factors. Such an application should contribute toour understanding of the evolution of complex biosystems. Inaddition to theoretical approaches, further genomic analyses atthe expression level are needed to verify our conclusions. Com-parative analyses with closely related species would be particu-larly powerful. For example, a number of miRNAs were recentlyidentified in A. lyrata from the genomic sequencing of this species(Fahlgren et al., 2010; Ma et al., 2010). Analyzing the two speciessimultaneously will provide new insights into the evolution ofmiRNA systems. One example is Takuno and Innan (2011), whodemonstrated that the expression of miRNA target genes isextensively tuned-up by selection through optimizing codonusage. This observation demonstrates the importance of evolu-tionary changes to expression patterns during the evolution ofmiRNA systems, and is consistent with our conclusions.

Acknowledgments

This work was partly supported by a grant from the JapanScience and Technology Agency (JST) to H.I. T.A. is a JSPS

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–6564

Page 12: Journal of Theoretical Biology - SOKEN

post-doctoral fellow (Japan Society for the Promotion of Science).The authors very much thank the two anonymous reviewers fortheir useful and kind comments that improved the manuscript a lot.

References

Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P., 2008. MolecularBiology of the Cell, 5th ed. Garland Science, New York, pp. 493–494.

Axtell, M.J., Bartel, D.P., 2005. Antiquity of microRNAs and their targets in landplants. Plant Cell 17, 1658–1673.

Axtell, M.J., Bowman, J.L., 2008. Evolution of plant microRNAs and their targets.Trends Plant Sci. 13, 343–349.

Backman, T.W., Sullivan, C.M., Cumbie, J.S., Miller, Z.A., Chapman, E.J., Fahlgren, N.,Givan, S.A., Carrington, J.C., Kasschau, K.D., 2008. Update of ASRP: theArabidopsis Small RNA Project database. Nucl. Acids Res. 36, 982–985.

Bartel, D.P., 2004. MicroRNAs: genomics, biogenesis, mechanism, and function.Cell 116, 281–297.

Bonnet, E., Wuyts, J., Rouze, P., Van de Peer, Y., 2004. Detection of 91 potentialconserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifiesimportant target genes. Proc. Natl. Acad. Sci. USA 101, 11511–11516.

Borneman, A.R., Gianoulis, T.A., Zhang, Z.D., Yu, H., Rozowsky, J., Seringhaus, M.R.,Wang, L.Y., Gerstein, M., Snyder, M., 2007. Divergence of transcription factorbinding sites across related yeast species. Science 317, 815–819.

Carrington, J.C., Ambros, V., 2003. Role of microRNAs in plant and animaldevelopment. Science 301, 336–338.

Chen, K., Rajewsky, N., 2007. The evolution of gene regulation by transcriptionfactors and microRNAs. Nat. Rev. Genet. 8, 93–103.

Crow, J.F., Kimura, M., 1970. An Introduction to Population Genetics Theory.Harper and Row, New York.

Des Marais, D.L., Rausher, M.D., 2008. Escape from adaptive conflict after duplica-tion in an anthocyanin pathway gene. Nature 454, 762–765.

Fahlgren, N., Jogdeo, S., Kasschau, K.D., Sullivan, C.M., Chapman, E.J., Laubinger, S.,Smith, L.M., Dasenko, M., Givan, S.A., Weigel, D., Carrington, J.C., 2010.MicroRNA gene evolution in Arabidopsis lyrata and Arabidopsis thaliana. PlantCell 22, 1074–1089.

Floyd, S.K., Bowman, J.L., 2004. Gene regulation: ancient microRNA targetsequences in plants. Nature 428, 485–486.

He, L., Hannon, G.J., 2004. MicroRNAs: small RNAs with a big role in generegulation. Nat. Rev. Genet. 5, 522–531.

Hughes, A.L., 1994. The evolution of functionally novel proteins after geneduplication. Proc. Biol. Sci. 256, 119–124.

Innan, H., Kondrashov, F., 2010. The evolution of gene duplications: classifying anddistinguishing between models. Nat. Rev. Genet. 11, 97–108.

Jiang, D., Yin, C., Yu, A., Zhou, X., Liang, W., Yuan, Z., Xu, Y., Yu, Q., Wen, T., Zhang,D., 2006. Duplication and expression analysis of multicopy miRNA gene familymembers in Arabidopsis and rice. Cell Res. 16, 507–518.

Jones-Rhoades, M.W., Bartel, D.P., 2004. Computational identification of plantmicroRNAs and their targets, including a stress-induced miRNA. Mol. Cell 14,787–799.

Kozomara, A., Griffiths-Jones, S., 2011. miRBase: integrating microRNA annotationand deep-sequencing data. Nucl. Acids Res. 39, D152–D157.

Llave, C., Xie, Z., Kasschau, K.D., Carrington, J.C., 2002. Cleavage of Scarecrow-likemRNA targets directed by a class of Arabidopsis miRNA. Science 297,2053–2056.

Lynch, M., 2007. The Origins of Genome Architecture. Sinauer Associates,Sunderland.

Ma, Z., Coruh, C., Axtell, M.J., 2010. Arabidopsis lyrata small RNAs: transient MIRNAand small interfering RNA loci within the Arabidopsis genus. Plant Cell 22,1090–1103.

Maher, C., Stein, L., Ware, D., 2006. Evolution of Arabidopsis microRNA familiesthrough duplication events. Genome Res. 16, 510–519.

Odom, D.T., Dowell, R.D., Jacobsen, E.S., Gordon, W., Danford, T.W., MacIsaac, K.D.,Rolfe, P.A., Conboy, C.M., Gifford, D.K., Fraenkel, E., 2007. Tissue-specifictranscriptional regulation has diverged significantly between human andmouse. Nat. Genet. 39, 730–732.

Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B., Bartel, D.P., 2002.MicroRNAs in plants. Genes Dev. 16, 1616–1626.

Rhoades, M.W., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B., Bartel, D.P., 2002.Prediction of plant microRNA targets. Cell 110, 513–520.

Rueffler, C., Hermisson, J., Wagner, G.P., 2012. Evolution of functional specializa-tion and division of labor. Proc. Natl. Acad. Sci. USA 109, 326–335.

Schwab, R., Palatnik, J.F., Riester, M., Schommer, C., Schmid, M., Weigel, D., 2005.Specific effects of microRNAs on the plant transcriptome. Dev. Cell 8, 517–527.

Takuno, S., Innan, H., 2008. Evolution of complexity in miRNA-mediated generegulation systems. Trends Genet. 24, 56–59.

Takuno, S., Innan, H., 2011. Selection fine-tunes the expression of microRNA targetgenes in Arabidopsis thaliana. Mol. Biol. Evol. 28, 2429–2434.

Tanzer, A., Reister, M., Hertel, J., Bermudez-Santana, C.I., Gorodkin, J., Hofacker, I.L.,Stadler, P.F., 2010. Evolutionary genomics of microRNAs and their relatives. In:Caetano-Anolles, G. (Ed.), Evolutionary Genomics and Systems Biology. Wiley-Blackwell, pp. 295–327.

Tanzer, A., Stadler, P.F., 2004. Molecular evolution of a microRNA cluster. J. Mol.Biol. 339, 327–335.

Tischler, J., Lehner, B., Fraser, A.G., 2008. Evolutionary plasticity of geneticinteraction networks. Nat. Genet. 40, 390–391.

Xie, Z., Kasschau, K.D., Carrington, J.C., 2003. Negative feedback regulation ofDicer–Like1 in Arabidopsis by microRNA-guided mRNA degradation. Curr. Biol.13, 784–789.

T. Akita et al. / Journal of Theoretical Biology 311 (2012) 54–65 65