on the design of neural networks in the brain by genetic ... · it is suggested that for each class...

On the design of neural networks in the brain by geneticevolution

Edmund T. Rolls*, Simon M. Stringer

Department of Experimental Psychology, Oxford University, South Parks Road, Oxford OX1 3UD, UK

Received 1 October 1999

Abstract

Hypotheses are presented of what could be speci®ed by genes to enable the di�erent functional architectures of the neural

networks found in the brain to be built during ontogenesis. It is suggested that for each class of neuron (e.g., hippocampal CA3pyramidal cells) a small number of genes specify the generic properties of that neuron class (e.g., the number of neurons in theclass, and the ®ring threshold), while a larger number of genes specify the properties of the synapses onto that class of neuronfrom each of the other classes that makes synapses with it. These properties include not only which other neuron classes the

synapses come from, but whether they are excitatory or inhibitory, the nature of the learning rule implemented at the synapse,and the initial strength of such synapses. In a demonstration of the feasibility of the hypotheses to specify the architecture ofdi�erent types of neuronal network, a genetic algorithm is used to allow the evolution of genotypes which are capable of

specifying neural networks that can learn to solve particular computational tasks, including pattern association, autoassociation,and competitive learning. This overall approach allows such hypotheses to be further tested, improved, and extended with thehelp of neuronal network simulations with genetically speci®ed architectures in order to develop further our understanding of

how the architecture and operation of di�erent parts of brains are speci®ed by genes, and how di�erent parts of our brains haveevolved to perform particular functions. # 2000 Elsevier Science Ltd. All rights reserved.

Keywords: Genetic algorithm; Natural selection; Cerebral cortex; Pattern association; Autoassociation; Competitive networks

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558

2. Description of the genes that could build di�erent types of neuronal network in di�erent brainareas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559

3. Genetic selection of neural network parameters to produce di�erent network architectures withdi�erent functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563

4. Simulation of the evolution of neural networks using a genetic algorithm . . . . . . . . . . . . . . 5644.1. The neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564

4.2. The speci®cation of the genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5644.3. The genetic algorithm, and general procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5674.4. Pattern association networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5684.5. Autoassociative networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572

4.6. Competitive networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575

Progress in Neurobiology 61 (2000) 557±579

0301-0082/00/$ - see front matter # 2000 Elsevier Science Ltd. All rights reserved.

PII: S0301-0082(99 )00066 -0

www.elsevier.com/locate/pneurobio

* Corresponding author. Tel.: +44-1865-271348; fax: +44-1865-310447.

E-mail address: [email protected] (E.T. Rolls).

5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

1. Introduction

Analysis of the structure and function of di�erentbrain areas is starting to reveal how the operation ofnetworks of neurons may implement the functions ofeach brain area. In particular, the operation of a num-ber of simple types of network, which are each foundin a number of di�erent brain areas, are becomingunderstood (Rolls and Treves, 1998). One example ofthese networks is pattern association memories inwhich one input (e.g., an unconditioned stimulus)drives the output neurons through unmodi®ablesynapses, and a second input (e.g., a conditionedstimulus) has associatively modi®able synapses ontothe output neurons, so that by associative learning itcan come to produce the same output as the uncondi-tioned stimulus (see Fig. 1). Pattern association mem-ory may be implemented in structures such as theamygdala and orbitofrontal cortex to implementstimulus-reinforcement learning, e.g. associating thesight of a stimulus with pain (an example of the learn-ing involved in emotion), or the sight of anotherstimulus with the taste of food (an example of the roleof this type of learning in motivational behavior)(Rolls and Treves, 1998; Rolls, 1999). Pattern associ-ation learning may also be used in the connections ofbackprojecting neurons in the cerebral cortex onto the

apical dendrites of neurons in the preceding corticalarea as these connections may be associatively modi®-able (Rolls and Treves, 1998). A second example isautoassociation networks characterized by recurrentcollateral axons with associatively modi®able synapseswhich may implement functions such as short termmemory in the cerebral cortex, and episodic memoryin the hippocampus (see e.g., Fig. 2). A third exampleis competitive learning, where there is one major set ofinputs to a network connected with associatively modi-®able synapses, and mutual (e.g., lateral) inhibitionbetween the output neurons (through e.g., inhibitoryfeedback neurons) (see Fig. 3). Competitive networkscan be used to build feature analyzers by learning torespond to clusters of inputs which tend to co-occur,and may be fundamental building blocks of perceptualsystems (Rolls and Treves, 1998). Allowing short rangeexcitatory connections between neurons (as in the cer-ebral cortex) and longer range inhibitory connectionscan lead to the formation of topographic maps wherethe closeness in the map re¯ects the similarity betweenthe inputs being mapped (Kohonen, 1995; Hertz et al.,1990; Rolls and Treves, 1998).

The question then arises as part of understandingbrain function of how the networks found in di�erentbrain areas actually have evolved, and how they maybe speci®ed by genes. A principal aim of this paper is

Fig. 2. The architecture of an autoassociative neural network.

Fig. 1. A pattern association neural network. An unconditioned

stimulus has activity or ®ring rate ei for the ith neuron, and produces

®ring ri of the ith neuron. The conditioned stimuli have activity or

®ring rate r 0j for the jth axon.

E.T. Rolls, S.M. Stringer / Progress in Neurobiology 61 (2000) 557±579558

by a process of comparing the networks found indi�erent brain areas, and in particular in how they dif-fer from each other, to suggest a set of parametersbeing speci®ed by genes which could lead to the build-ing during development of the neuronal network func-tional architectures found in di�erent brain regions.The choice of parameters is guided not only by com-parison of the functional architecture of di�erent brainregions, but also by what parameters, if speci®ed bygenes, could with a reasonably small set actually buildthe networks found in di�erent brain regions. The ideais that if these parameters are speci®ed by di�erentgenes, then genetic reproduction and natural selectionusing these parameters could lead to the evolution ofneuronal networks in our brains well adapted fordi�erent functions.

A second principal aim of the paper is to show howthe su�ciency of the general approach, and the appro-priateness of the particular parameters selected, can betested and investigated using genetic algorithms whichactually use the hypothesised genes to specify net-works. We show this by implementing a genetic algor-ithm, and testing whether it can search the highdimensional space provided by the suggested par-ameters to specify and build neuronal networks thatwill solve particular computational problems. Thecomputational problems we choose for this paper aresimple and well de®ned problems that can be solvedby one-layer networks. These problems are pattern as-sociation, autoassociation, and competition, whichrequire quite di�erent architectures for them to besolved (Rolls and Treves, 1998; Hertz et al., 1990) (seeFigs. 1±3). Because the problems to be solved are wellspeci®ed, we can de®ne a good ®tness measure for theoperation of each class of network, which will be usedto guide the evolution by reproduction involving gen-etic variation and selection in each generation.Although we do not suppose that the actual par-ameters chosen here for illustration are necessarilythose speci®ed by mammalian genes, they have beenchosen because they seem reasonable given the di�er-

ences in the functional architecture of di�erent brainareas, and allow illustration of the overall conceptdescribed in this paper about how di�erent networkarchitectures found in di�erent brain regions evolve.Although the computational problems to be solvedhere have been chosen to have well understood one-layer neural network solutions, once this generalapproach has been established, it would be of greatinterest in future to examine problems which aresolved by multilayer networks in the brain e.g., invar-iant object recognition (Ullman 1996; Rolls and Treves1998; Wallis and Rolls 1997), and episodic memory(Rolls and Treves, 1998; McClelland et al., 1995), inorder to understand how brain mechanisms to solvecomplex problems may evolve and how they may oper-ate.

In recent years there has been increasing interest inthe application of genetic algorithms to neural net-works (see Vonk et al., 1997). Some of the applicationsthat are potentially relevant to understanding biologi-cal nervous systems range from computer simulatedanimats (Nol® et al., 1994; Gracias et al., 1996; Huberet al., 1996; Lund and Parisi, 1996) to behaviouralrobotics (Floreano and Mondada, 1994; Hoshino andTsuchida, 1996; Kuwana et al., 1996; Husbands et al.,1998), with the genetic algorithm used as a tool to dis-cover neural networks capable of solving certain kindsof task. However, the purpose of this paper is toexplore how genes could specify the actual neuronalnetwork functional architectures found in the mamma-lian brain, such as those found in the cerebral cortex.Indeed, this paper takes examples of some of theactual architectures and prototypical networks foundin the cerebral cortex, and explores how these architec-tures could be speci®ed by genes which allow the net-works when built to implement some of theprototypical computational problems that must besolved by neuronal networks in the brain. Moreover,the three fundamental kinds of biological network con-sidered here, pattern associators, auto-associators andcompetitive nets, are potentially the heterogeneousbuilding blocks for the multilayer architectures foundin mammalian brain systems such as the hippocampusand cerebral cortex (Rolls and Treves, 1998). Investi-gation of how genes could specify the operation ofthese single layer components of multilayer architec-tures is a necessary ®rst step towards understandinghow more complex multilayer architectures could bebuilt.

2. Description of the genes that could build di�erenttypes of neuronal network in di�erent brain areas

The hypotheses for the genes that might specify thedi�erent types of neuronal network in di�erent brain

Fig. 3. The architecture of a competitive neural network.

E.T. Rolls, S.M. Stringer / Progress in Neurobiology 61 (2000) 557±579 559

areas are introduced next. The hypotheses are basedon knowledge of the architectures of di�erent brainregions, which are described in sources such as inRolls and Treves (1998), Shepherd (1990), Braitenbergand Schuz (1991), Peters and Jones (1984) and Somo-gyi et al. (1998), and on a knowledge of some of theparameters that in¯uence the operation of neuronalnetworks. These genes were used in the simulations inwhich evolution of the architectures was exploredusing genetic algorithms. The emphasis in this sectionis on the rationale for suggesting di�erent genes. Amore formal speci®cation of the genes, together withadditional information on how they were implementedin the simulations, is provided in Section 4.2. It mayalso be noted that large numbers of genes will beneeded to specify for example the operation of a neur-on. This paper focusses on those genes that, given thebasic building blocks of neurons, may specify thedi�erences in the functional architecture of di�erentbrain regions. In addition, we note that here we donot try to describe the operation and properties ofsimple neural networks, nor the empirical phenomenasuch as synaptic long-term potentiation and long-termdepression which provide empirical support for someof the parameters chosen, but instead refer the readerto sources such as Rolls and Treves (1998) to whichcross references are given, and where in addition refer-ences to the background literature are provided, andin Koch (1999).

The overall conception is as follows. There are fartoo few genes (in humans 60,000±80,000, MaynardSmith and Szathmary, 1999) to specify each synapse

(i.e., which neuron is connected to which other neuron,and with what connection strength) in the brain. (Thenumber of synapses in the human brain, with 1010±1011 neurons in the brain, and perhaps an average of10,000 synapses each, is in the order of 1014±1015). Inany case, brain design is likely to be more ¯exible ifthe actual strength of each synapse is set up by self-or-ganisation and experience. On the other hand, brainconnectivity is far from random. Some indicationsabout what is speci®ed can be gathered by consideringthe connectivity of the hippocampus (see Rolls andTreves, 1998, Chapter 6). The CA3 pyramidal cellseach receive approximately 50 mossy ®bre synapsesfrom dentate granule cells, 12,000 synapses from otherCA3 cells formed from recurrent collaterals, and 3600perforant path synapses originating from entorhinalcortex neurons (see Fig. 4, after Rolls and Treves(1998), Fig. 6.6). In the preceding stage, the 1,000,000dentate granule cells (the numbers given are for therat) receive one main source of input, from the entorh-inal cortex, and each makes approximately 14 synapseswith CA3 cells (see Fig. 4). On the basis of consider-ations of this type for many di�erent brain areas, it ispostulated that for each class of cell the genome speci-®es the approximate numbers of synapses the class willreceive from a speci®ed other class (including itself, forrecurrent collaterals), and the approximate number ofsynapses its axons will make onto speci®ed classes oftarget cells (including itself). The individual neuronswith which synapses are made are not speci®ed, butare chosen randomly, though sometimes under a con-straint (speci®ed by another gene) about how far awaythe axon should travel in order to make connectionswith other neurons. One parameter value of the lattergene might specify the widespread recurrent collateralsystem of the CA3 neurons (Rolls and Treves 1998,Section 6.4). Another value for the latter might specifymuch more limited spread of recurrent collaterals withthe overall density decreasing rapidly from the cell oforigin, which, as in the cerebral cortex and if ac-companied by longer range inhibitory processes im-plemented by feedback inhibitory neurons, wouldproduce center-surround organisation, and tend tolead to the formation of topological maps (see Koho-nen (1995) and Rolls and Treves (1998), Section 4.6).The actual mechanism by which this is implemented isnot the subject of this paper, but would presumablyinvolve some (genetically speci®ed) chemical recog-nition process, together with the production of a lim-ited quantity of a trophic substance that would limitthe number of synapses from, and made to, each otherclass of cell. Some of these processes would of coursebe taking stage primarily during development (onto-genesis), when simple rules such as making local con-nections as a function of distance away would beadequate to specify the connectivity, without the need

Fig. 4. Schematic architecture of hippocampal CA3 pyramidal cell

network.


for long pathway connection routes (such as betweenthe substantia nigra and the striatum) to be geneticallyencoded. It is presumably because of the complex gen-etic speci®cation that would be required to specify inthe adult brain the route to reach all target neuronsthat epigenetic factors are so important in embryologi-cal development, and that as a general rule the for-mation of new neurons is not allowed in adultmammalian brains.

It is a feature of brain neuronal network design thatnot only is which class of neuron to connect to appar-ently speci®ed, but also approximately where on thedendrite the connections from each other class of neur-on should be received. For example, in the CA3 sys-tem, the mossy ®bres synapse closest to the cell body(where they can have a strong in¯uence on the cell),the CA3 recurrent collaterals synapse on the next partof the dendrite, and the entorhinal inputs synapse onthe more distal ends of the dendrites. The e�ect of theproximal relative to the more distal synapses willdepend on a number of factors, including for distalinputs whether proximal inputs are active and therebyoperating as current shunts producing division, and onthe diameter of the dendrite, which will set its cableproperties (Koch, 1999). If the dendritic cable diameteris speci®ed as large, all inputs (including distal inputs)will sum reasonably linearly to inject current into thecell body, whereas if speci®ed as small will tend toresult in local computation on a dendrite, with perhapsstrong local depolarization produced by a set of locallycoactive inputs and synaptic modi®cation local to aclose coactive set of synapses on a dendrite, resultingin higher order association performed by what arecalled Sigma-Pi neurons (see Rolls and Treves, 1998,Section 1.5; Koch, 1999, Section 14.4.2). (Such higherorder neural computation is not the main theme ofRolls and Treves (1998), nor of the networks con-sidered in this paper, which generally operate by a uni-form post-synaptic learning term, rather than one localto a patch of dendrite, though the genetic speci®cationcan easily be extended to incorporate the cable proper-ties of neurons, by for example specifying parameterssuch as the diameter of the dendrite, see Koch, 1999).

Although inhibitory neurons have not been includedin the examples already given, similar speci®cationswould be applicable, including for example a simplespeci®cation of the di�erent parts of the dendrite onwhich di�erent classes of inhibitory neuron synapse inthe hippocampus (Buhl et al., 1994), and hencewhether the e�ect is subtractive or shunting (Koch,1999). In cortical areas, both feedforward and feed-back inhibition (the latter from pyramidal cells via in-hibitory neurons back to the same population ofpyramidal cells) could be produced by a simple geneticspeci®cation of this type.

Next, the nature of the synaptic connections, and

the learning rule for synaptic modi®ability, must bespeci®ed. One gene speci®es in the simulationdescribed later whether a given neuron class is ex-citatory or inhibitory. In the brain, this gene (orgenes) would specify the transmitter (or transmittersin some cases) that are released, with the actuale�ects of for example glutamate being excitatory,and gamma-amino-butyric acid (GABA) inhibitory.The learning rule implemented at each synapse isdetermined by another gene (or genes). One possiblee�ect speci®ed is no synaptic modi®cation. Anotheris a Hebb rule of associative synaptic plasticity(increase the synaptic strength if both presynapticand postsynaptic activity are high, the simple ruleimplemented by associative long-term potentiation(LTP)). For this, the genes might specify NMDA(N-methyl-D-aspartate) receptors on the post-synapticneurons together with the linked intracellular pro-cesses that implement LTP (Rolls and Treves, 1998;Wang et al., 1997; Buonomano and Merzenich,1998). Another possible e�ect is long-term de-pression (LTD). This may be heterosynaptic, that isthe synaptic weight may be decreased if there ishigh post-synaptic activity but low presynaptic ac-tivity. Part of the utility of this in the brain is thatwhen combined with LTP in pattern associators andautoassociators the e�ect is to remove the otherwisepositive correlation that would be produced betweendi�erent input patterns if all patterns are speci®edby positive-only ®ring rates, as they are in thebrain (see Rolls and Treves, 1998, Sections 2.4.7and 3.3.6). The e�ect of removing this correlation isto reduce interference between patterns, and tomaximize the storage capacity. A further usefulproperty of heterosynaptic LTD is that it can helpto maintain the total synaptic strength onto a neur-on constant, by decreasing synaptic strengths frominputs to a neuron which are inactive when theneuron is currently ®ring. This can be very usefulin competitive networks to prevent some winningneurons from continually increasing their synapticstrength so that they win to all patterns, and canin addition be seen as part of the process by whichthe synaptic weight vector is moved to point in thedirection of a current input pattern of neuronal ac-tivity (see Rolls and Treves, 1998, Chapter 4).Another type of LTD is homosynaptic, in whichthe synaptic strength decreases if there is high pre-synaptic activity but low or moderate postsynapticactivity. This might be useful in autoassociativesynaptic networks, as combined with LTP and het-erosynaptic LTD it can produce a covariance-likelearning rule (see Rolls and Treves, 1998, SectionA3.1.4), and it may also be useful in forcing neur-ons to be allocated to one feature or another infeature analyzers if combined with a sliding sensi-


tivity factor that depends on the average post-synaptic activity of a neuron (Bienenstock et al.,1982). Another learning rule of potential biologicalimportance is a trace learning rule, in which forexample the post-synaptic term is a short term aver-age in an associative Hebb-like learning rule. Thisencourages neurons to respond to the current stimu-lus in the same way as they did to previous stimuli.This is useful if the previous stimuli are di�erentviews etc of the same object (which they tend stat-istically to be in our visual world), because thispromotes invariant responses to di�erent versions ofthe same object. Use of such a rule has been pro-posed to be one way in which networks can learninvariant responses (FoÈ ldiaÂ k, 1991; Rolls, 1992;Wallis and Rolls, 1997; Rolls and Treves, 1998;Rolls and Milward, 2000; Rolls and Stringer, 2000).Such a trace could be implemented in real neuronsby a number of di�erent mechanisms, includingslow unbinding of glutamate from the NMDAreceptor (which may take 100 ms or more), andmaintaining a trace of previous neuronal activity byusing short term autoassociative attractor memoriesimplemented by recurrent collaterals in the cerebralneocortex (Rolls and Treves, 1998). Other types ofsynaptic modi®cation that may be genetically speci-®ed include non-associative LTP (as may be im-plemented by the hippocampal mossy ®bresynapses), and non-associative LTD. Other genesworking with these may set parameters such as therapidity with which synapses learn (which in astructure such as the hippocampus may be veryfast, in one trial, to implement memory of a par-ticular episode, and in structures such as the basalganglia may be slow to enable the learning ofmotor habits based on very many trials of experi-ence); and the initial and maximal values ofsynapses (e.g., the mossy ®bre synapses onto hippo-campal CA3 cells can achieve high values).

Another set of genes speci®es some of the biophysi-cal parameters that control the operation of individualneurons (see Koch (1999) for background). One gene(in a simulation, or biologically perhaps several) speci-®es how the activation hi of a neuron i is calculated. Alinear sum of the inputs r 0j weighted by the synapticweights wij is the standard one used in most models ofneuronal networks (Rolls and Treves, 1998), and thosesimulated here, as follows:

hi �Xj

r 0j wij �1�

whereP

j indicates that the sum is over the C inputaxons (or connections) indexed by j. An alternative isthat there is non-linearity in this process, produced forexample by local interactions in dendrites, including

local shunting, a�ected most notably by the cable di-ameter of the dendrite, which is what such genes maycontrol. For most pyramidal cells, the dendrite diam-eter is su�ciently large that linear summation to pro-duce the net current injected into the cell bodies is areasonable approximation (Koch, 1999). Severalfurther genes set the activation function of the neuron.(The activation function is the relation between the ac-tivation of the neuron and its ®ring. Examples areshown in Fig. 5). One possibility is linear. A second,and the most biologically plausible, is to have athreshold, followed by a part of the curve where the®ring rate increases approximately linearly with the ac-tivation, followed by a part of the curve where the ®r-ing rate gradually saturates to a maximum. Thisfunction can be captured by for example a sigmoid ac-tivation function as follows:

ri � 1

1� eÿ2b�hiÿa��2�

where a and b are the sigmoid threshold and slope, re-spectively. The output of this function, also sometimesknown as the logistic function, is 0 for an input ofÿ1, 0.5 for hi equal to a, and 1 for �1: For thistype of activation function, at least two genes wouldbe needed (and biologically there would probably beseveral to specify the biophysical parameters), one tocontrol the threshold, and a second to control theslope. A third possibility is to have a binary threshold,producing a neuron which moves from zero activitybelow threshold to maximal ®ring above threshold.This activation function is sometimes used in math-ematical modelling because of its analytic tractability.One variable which is controlled by the threshold (andto some extent the slope) of the activation function isthe proportion of neurons in a population that arelikely to be ®ring for any one input. This is the sparse-ness of the representation. If the neurons have a binaryactivation function, the sparseness may be measuredjust by the proportion of active neurons, and takes thevalue 0.5 for a fully distributed representation, inwhich half of the neurons are active. For neurons withcontinuous activation functions, the sparseness a maybe de®ned as

a �

�Xi

ri=N�2

Xi

r2i =N�3�

where N is the number of neurons in the layer (Rollsand Treves, 1998). This works also for binary neurons.To provide precise control of the sparseness in some ofthe simulations described below we made provision forthis to be controlled as an option directly by one gene,rather than being determined indirectly by the


threshold and slope gene-speci®ed parameters of thesigmoid activation function shown in Eq. (2).

3. Genetic selection of neural network parameters toproduce di�erent network architectures with di�erentfunctions

Given the proposals just described for the types ofparameter that are determined by di�erent genes, wedescribe next how gene selection is postulated to leadto the evolution of neuronal networks adapted for par-ticular (computational) functions. The description isphrased in terms of simulations of the processes usinggenetic algorithms, as investigations of this type enablethe processes to be studied precisely. The processesinvolve reproduction, ontogenesis followed by tests ofhow well the o�spring can learn to solve particularproblems, and natural selection. First, a selection ofgenes is made for a set of G genotypes in a population,which should be of a certain minimum size for evol-ution to work correctly. The genes are set out on achromosome (or chromosomes). (E�ects of gene link-age on a chromosome are not considered here.) Eachset of genes is a genotype. The selection of individualgenotypes from which to breed is made a probabilisticfunction which increases with the ®tness of the geno-type, measured by a ®tness function that is quanti®edby how well that genotype builds an individual that

can solve the computational problem that is set. Hav-ing chosen two genotypes in this way, two genotypesto specify two new (haploid) o�spring for the next gen-eration are made by the genetic processes of sexualreproduction involving both gene recombination andmutation, which occur with speci®ed probabilities.This process is repeated until G genotypes have beenproduced. Then G individuals are built with the net-work architectures speci®ed by the G genotypes. The®tness of these individuals is then measured by howwell they perform at the computational problem set. Inorder to solve the computational problem, the net-works are trained by presenting the set of input pat-terns, and adjusting the synaptic weights in thenetwork according to the learning rules speci®ed bythe genotype of that individual. The individuals thenbreed again with a probability of being selected forreproduction that is proportional to their ®tness rela-tive to that of the whole population. This process isallowed to proceed for many generations, duringwhich the ®tness of the best individual in the popu-lation, and the average ®tness, both increase if evol-ution is working. This type of genetic process is ane�cient method of searching through a high dimen-sional space (the space speci®ed by the genes), particu-larly where the space has many local optima so thatsimple hill climbing is ine�cient, and where there is asingle measure of ®tness (Holland, 1975; Ackley, 1987;Goldberg, 1989). An additional useful feature of gen-

Fig. 5. Activation functions.


etic search is that past gene combinations, useful poss-ibly in other contexts, can remain in the populationfor a number of generations, and can then be reusedlater on, without the need to search for those genecombinations again. This re-use of past combinationsis one of the features of genetic search that can makeit powerful, rapid, and show sudden jumps forward.

4. Simulation of the evolution of neural networks usinga genetic algorithm

Next in this paper we describe a simulation of theseprocesses using a genetic algorithm. The aims of thesimulations are to demonstrate the feasibility anddetails of the proposal; to provide an indication ofwhether the proposed genes can be used to guide e�-ciently the evolution of networks that can solve com-putational problems of the type solved by the brain;and to provide a tool for investigating in more detailboth the parameters that can best and biologicallyplausibly be genetically speci®ed to de®ne neural net-works in the brain, and more complicated multilayernetworks that operate to solve computationally di�-cult problems such as episodic memory or invariantvisual object recognition. In this paper, the proposalsare tested to determine whether three di�erent types ofone-layer network with di�erent architectures appro-priate for solving di�erent computational problemsevolve depending on the computational problem set.The problems are those appropriate for the patternassociator, autoassociator, and competitive networksshown schematically in Figs. 1±3.

4.1. The neural networks

The neural networks we consider have a number ofclasses of neuron. Within a class, a gene allows thenumber of neurons to vary from 1 up to N. For thesimulations described here, N was set to 100. Within aclass, the genetic speci®cation of a neuron is homo-geneous, with the neurons for that class having, forexample, identical activation functions and connec-tivity probabilities. The number of classes will dependon the individual task, e.g. pattern association, autoas-sociation, competitive nets, etc, which can be describedas one-layer networks, with one layer of computing el-ements between the input and output (see Figs. 1±3).However, in our simulations there is typically an inputlayer where patterns are presented, and an outputlayer where the performance of the network is tested.For the simulations described here, the number ofclasses was set to allow one-layer networks such asthese to be built, but the number of layers that couldbe formed is in principle under genetic control, andindeed multilayer networks can be formed if there is a

su�cient number of classes of neuron. Regarding theimplementation of inter-layer connection topologies,the neurons in individual layers exist in a circulararrangement, and connections to a cell in one layer arederived from a topologically related region of the pre-ceding layer. Connections to individual neurons maythen be established according to either a uniform orGaussian probability distribution centered on thetopologically corresponding location in the sendinglayer.

On discrete time steps each neuron i calculates aweighted sum of its inputs as shown in Eq. (1). This isin principle the subject of genetic modi®cation (seebelow), but the gene specifying this was set for thesimulations described here to this type of calculationof the neuronal activation. Next the neuronal ®ring riis calculated, using for example the standard sigmoidfunction shown in Eq. (2) and allowing a and b thesigmoid threshold and slope to evolve genetically.Next, there is an optional procedure that can be speci-®ed by the experimenter to be called to set the sparse-ness of the ®ring rates ri of a class of neuronaccording to Eq. (3), with a being allowed to evolve.After the neuronal outputs ri have been calculated, thesynaptic weights wij are updated according to one of anumber of di�erent learning rules, which are capableof implementing for example both long term poten-tiation (LTP) and long term depression (LTD), asdescribed below, and which are genetically selected.For example, the standard Hebb rule takes the form

Dwij � krir0j , �4�

where r 0j is the presynaptic ®ring rate and k is thelearning rate.

4.2. The speci®cation of the genes

The genes that specify the architecture and oper-ation of the network are described next. In principle,each gene evolves genetically, but for particular runs,particular genes can be set to speci®ed values to allowinvestigation of how other genes are selected whenthere are particular constraints. Each neural networkarchitecture is described by a genotype consisting of asingle chromosome of the following form

chromosome �

266664c1c2

..

.

cn

377775 �5�

where cl is a vector containing the genes specifying theproperties of neurons in class l, and n is the total num-ber of classes that is set manually at the beginning of arun of the simulation. The vectors cl take the form


cl �

26666664gl

hl1

hl2

..

.

hln

37777775 �6�

where the vector gl contains the intra-class propertiesfor class l, and the vectors hlm contain the inter-classconnection properties to class l from class m where mis in the range 1±n.

The vector of intra-class properties takes the form

gl �

2664blalblal

3775 �7�

where we have the following de®nitions for intra-classgenes.

(1) bl is the number of neurons in class l. bl is aninteger bounded between 2 and N, which was set to100 for the simulations described here. Individualclasses are restricted to contain more than one neuronsince a key strategy we have adopted for enhancingthe biological plausibility is to evolve classes composedof a number of neurons with homogeneous geneticspeci®cations rather than to specify genetically theproperties of individual single neurons.

(2) al is the threshold of the sigmoid transfer func-tion (2) for class l. al is a real number bounded withinthe interval [0.0, 200.0].

(3) bl is the slope of the sigmoid transfer function(2) for class l. bl is a real number bounded within theinterval [0.0, 200.0]. We note that low values of theslope will e�ectively specify a nearly linear activationfunction, whereas high values of the slope specify anearly binary activation function (see Fig. 5).

(4) al is the sparseness of ®ring rates within class l asde®ned by Eq. (3). By de®nition al is a real numberbounded within the interval [0.0, 1.0]. However, inpractice we ensure a minimum ®ring sparseness by set-ting al to lie within the interval �1:0=bl, 1:0�, where bl isthe number of neurons in class l. With binary neuronswith output r equal to either 0 or 1, setting alr1:0=blensures that at least one neuron is ®ring. For the simu-lations described in this paper, when this gene wasbeing used, a was set to 0.5 and produced a binary ®r-ing rate distribution with a sparseness of 0.5 unlessotherwise speci®ed. The alternative way of calculatingthe ®ring rates was to allow genes a and b specifyingthe sigmoid activation function to evolve.

The vectors of inter-class connection genes specifythe connections to class l from class m, and take theform

hlm �

266666666666666666666664

rlmslmclmelmzlmtlmplmslmqlmflmklmdlmulmvlm

377777777777777777777775

�8�

where we have the following de®nitions for inter-classgenes.

(1) rlm controls the inter-layer connection topologyin that it helps to govern which neurons in class mmake connections with individual neurons in class l.The exact de®nition of rlm depends on the type ofprobability distribution used to make the connections,which is governed by the gene slm described below. Forslm � 0, connections are established according to a uni-form distribution, and rlm speci®es the number of neur-ons within the spatial region from which connectionsmay be made. For slm � 1, connections are establishedaccording to a Gaussian distribution, where rlm speci-®es the standard deviation of the distribution. For theGaussian distribution, individual connections originatewith 68% probability from within a region of 1 stan-dard deviation away. These real valued variates arethen rounded to their nearest integer values. As notedin Section 2, connection topologies are characteristi-cally di�erent for di�erent connection types such asCA3 recurrent collateral connections, which are wide-spread, and intra-module connections in the cerebralcortex. This parameter may also be set by geneticsearch to enable the region of e�ect of inhibitory feed-back neurons to be just greater than the region withinwhich excitatory neurons receive their input; and toenable topographic maps to be built by arranging forthe lateral excitatory connections to operate within asmaller range than inhibitory connections. This par-ameter may also be set to enable multilayer networksto be built with feedforward connectivity which isfrom a small region of the preceding layer, but whichover many layers allows an output neuron to receivefrom any part of the input space, as happens in manysensory systems in which topology is gradually lostand as is implemented in a model of the operation ofthe visual cortical areas (Wallis and Rolls, 1997; Rollsand Milward, 2000). For the simulations describedhere, this gene was set to 100 to allow global connec-tivity, and thus to not specify local connectivity.

(2) slm speci®es for the region gene rlm the connec-


tion probability distribution to class l from class m. slmtakes the following values.

. slm � 0: Connections are established according to a

uniform probability distribution within the region

rlm:. slm � 1: Connections are established according to a

Gaussian probability distribution. rlm speci®es the

size of the region from which there is a 68% prob-

ability of individual connections originating.

In the experiments described here, the values of s were

set to 0.

(3) clm is the number of connections each neuron

in class l receives from neurons in class m. clmtakes integer values from 0 to bm (the number of

neurons within class m ). These connections are

made probabilistically, selecting for each class l of

neuron clm connections from class m. The probabil-

istic connections are selected either from a uniform

probability distribution (as used here), or according

to a Gaussian probability distribution. In both cases

the local spatial connectivity within the two classes

l and m is speci®ed by the region parameter gene

rlm:(4) elm speci®es whether synapses to class l from

class m are excitatory or inhibitory. elm takes the fol-

lowing values.

. elm � 0: Connections are inhibitory

. elm � 1: Connections are excitatory

For inhibitory connections with e � 0, the weights in

Eq. (1) are scaled by ÿ1, and no learning is allowed to

take place.

(5) zlm speci®es whether connections from class m to

class l are additive or divisive. zlm takes the following

values.

. zlm � 0: Connections are divisive

. zlm � 1: Connections are additive

This gene selects whether the neuronal activation hi is

calculated according to linear summation of the input

®ring rates r 0j weighted by the synaptic strengths wij, as

speci®ed by Eq. (1), and as used throughout this paper

by setting zlm � 1; or whether with zlm � 0 there is

local computation on a dendrite including perhaps

local summation of excitatory inputs and shunting

e�ects produced by inhibitory inputs or inputs close to

the cell body. The most important physical property of

a neuron that would implement the e�ects of this gene

would be the diameter of the dendrite (see Section 2

and Koch, 1999). If set so that zlm � 0 for local multi-

plicative and divisive e�ects to operate, it would be

reasonable and in principle straightforward to extend

the genome to include genes which specify where on

the dendrite with respect to the cell body synapses are

made from class m to class l neurons, the cable proper-ties of the dendrite, etc.

(6) tlm speci®es how the synaptic weights to class lfrom class m are initialised at the start of a simulation.tlm takes the following values.

. tlm � 0: Synaptic weights are set to zero. This mightbe optimal for the conditioned stimulus inputs for apattern associator, or the recurrent collateral con-nections in an autoassociator, for then existing con-nections would not produce interference in thesynaptic connections being generated during the as-sociative learning. This would produce a non-oper-ating network if used in a competitive network, foran input could not produce any output.

. tlm � 1: Synaptic weights are set to a uniform devi-ate in the interval [0, 1] scaled by the constant valueencoded on the genome as qlm (described below).

. tlm � 2: Synaptic weights are set to the constantvalue encoded on the genome as qlm (describedbelow). This could be useful in an associative net-work if a learning rule that allowed synaptic weightsto decrease as well as increase was speci®ed geneti-cally, because this would potentially allow corre-lations between the input patterns due to positive-only ®ring rates to be removed, yet prevent thesynapses from hitting the ¯oor of zero, which couldlose information (Rolls and Treves, 1998).

. tlm � 3: Synaptic weights are set to a Gaussian func-tion of distance x away from the a�erent neuron inclass l. This function takes the form

f�x� � 1

s��2Pp eÿx

2=2s2

where s is the standard deviation. This would be analternative way to implement local spatial connectivitye�ects to those implemented by rlm:

(7) plm is the scale factor used for synaptic weightinitialisation with a Gaussian distribution (i.e., fortlm � 3). plm is a real number bounded within the inter-val [0.0, 100.0].

(8) slm is the standard deviation used for synapticweight initialisation with a Gaussian distribution (i.e.,for tlm � 3). slm is a real number bounded within theinterval [0.001, 10.0]. slm is restricted to be greaterthan 0.001 to avoid a singularity that would occur inthe Gaussian function for slm � 0:0:

(9) qlm is the scale factor used for synaptic weightinitialisation for tlm � 1 and tlm � 2 (see above). qlm isa real number bounded within the interval [0.0, 100.0].If tlm is 1 or 2, qlm would be expected to be small forpattern associators and autoassociators (as describedabove).

(10) flm speci®es which learning rule is used to


update weights at synapses to class l from class m. flmtakes the following values.

. flm � 0: Learning rule 0 (no learning) Dwij � 0:

. flm � 1: Learning rule 1 (Hebb rule) Dwij � krir0j :

. flm � 2: Learning rule 2 (modi®ed Hebb rule) Dwij�kri�r 0j ÿhr 0j i� where h�i indicates an average value. Inthis rule LTP is incorporated with heterosynapticLTD set to keep the average weight unchanged. Useof this rule can help to remove the correlationsbetween the input patterns produced by positive-only ®ring rates (Rolls and Treves, 1998). If thisrule is selected, it will work best if the synapticweight initialisation selected genetically is not zero(to allow the weights to decrease without hitting a¯oor), and is optimally a constant (so that noise isnot added to the synaptic weights which implementthe association).

. flm � 3: Learning rule 3 (modi®ed Hebb rule) Dwij �kri�r 0j ÿ wij �: This rule has the e�ect of helping tomaintain the sum of the synaptic weights on a neur-on approximately constant, and is therefore helpfulin competitive networks (Rolls and Treves, 1998).This rule accords with the neurophysiological obser-vation that it is easier to demonstrate LTD afterLTP has been produced.

. flm � 4: Learning rule 4 (covariance rule) Dwij� k�riÿhrii��r 0j ÿhr 0j i�:

. flm � 5: Learning rule 5 (homosynaptic LTD withLTP) Dwij�k�riÿhrii�r 0j :

. flm � 6: Learning rule 6 (non-associative LTP)Dwij � kr 0j : This rule may be implemented in the hip-pocampal mossy ®bre synapses.

. flm � 7: Learning rule 7 (non-associative LTD)Dwij � ÿkr 0j :

. flm � 8: Learning rule 8 (trace learning rule) Dwij �k�rti r

0 tj where the trace �rti is updated according to

�rti � �1ÿ Z�rti � Z �rtÿ1i �9�

and we have the following de®nitions: �rti is the tracevalue of the output of the neuron at time step t, and Zis a trace parameter which determines the time ornumber of presentations of exemplars of stimuli overwhich the trace decays exponentially. The optimalvalue of Z varies with the presentation sequence length.Further details, and a hypothesis about how this typeof learning rule could contribute to the learning ofinvariant representations, are provided by Wallis andRolls (1997), Rolls and Treves (1998) and Rolls andMilward (2000). For the simulations described in thispaper, this learning rule was disabled.

In some simulations described later, the synapticweights were clipped to be r0, consistent with what isbiologically plausible. However, in some simulationswe explored the impact on evolution and performance

of allowing the synaptic weights to vary from positiveto negative. In addition, an implementation detail forrules f � 2, 4 or 5, where an estimate of the average®ring rate of a set of neurons hri has to be subtracted,is that hri is taken to be 0.5, which is the average ®ringof a binary vector with the sparseness value of 0.5which was used for the test patterns and was that ofthe desired output patterns except where stated other-wise.

(11) klm is the learning rate k for synaptic weights toclass l from class m. klm is a real number boundedwithin the interval [0.0, 10.0].

(12) dlm is the maximum size allowed for the absol-ute value of synaptic weight updates from class m toclass l. dlm is a real number bounded within the inter-val [0.0, 10.0].

(13) ulm is the maximum size allowed for synapticweights to class l from class m. ulm is a real numberbounded within the interval [ÿ100.0, 100.0].

(14) vlm is the minimum size allowed for synapticweights to class l from class m. vlm is a real numberbounded within the interval [ÿ100.0, 100.0].

Implementation details are that v is bounded to beless than u; and that during initialisation of the synap-tic weights, v and u are applied after the other initiali-sation parameters have operated.

4.3. The genetic algorithm, and general procedure

The genetic algorithm used is a conventional onedescribed by Goldberg (1989). Each individual geno-type was a haploid chromosome of the type justde®ned. The values for the genes in the initial chromo-some were chosen at random from the possible values.(In this simulation, if a value in the genotype was oneof a large number of possible values, then a single inte-ger or real-valued gene was used, whereas in the brainwe would expect several genes to be used, with perhapseach of the several genes coding for di�erent parts ofthe range with di�erent precision. For example, clm,the number of synaptic connections to a neuron inclass l from a neuron in class m, is to be chosen from2 to 100 in the simulation, and uses a single integer. Inthe brain, where we might expect the number of con-nections per neuron to vary in the range 5±50,000, onepossible scenario is that ®ve di�erent genes would beused, coding for the values 5, 50, 500, 5000 and50,000. We would thus expect that the number ofgenes required to specify the part of the genotypedescribed here would increase by a factor in the orderof 5 in the brain.) There was a set number of geno-types (and hence individuals) per generation, whichwas 100 for the simulations described. (Too small anumber reduces the amount of diversity in the popu-lation su�ciently to lead to poor performance of thegenetic evolution.) From each genotype an individual


network was constructed, with all connections beingchosen probabilistically within the values speci®ed bythe genotype, and with a uniform probability distri-bution (i.e., slm was set to 0 for the simulationsdescribed), and the initial values for the synapses werealso chosen as proscribed by the genotype. That indi-vidual was then tested on the computation being stu-died, e.g. pattern association, and the ®tness(performance) of the network was measured. Theactual testing involved presenting the set of patterns tobe learned (as speci®ed below for each network) to thenetwork while allowing learning to occur, and thentesting the performance of the network by presentingthe same, similar, or incomplete test patterns andmeasuring the output of the network. The ®tnessmeasure used for each class of network is de®nedbelow. Unless otherwise stated, each genotype wastested 20 times to obtain an average ®tness measurefor that genotype. (This step, though not necessary atall for successful evolution, did ensure, given the prob-abilistic way in which individual connections and theirvalues were assigned, that an accurate measure of the®tness of that genotype was obtained, and henceallowed smoother evolution curves to be plotted. Inreal life, the numbers speci®ed for the networks wouldbe larger, and the statistical e�ects of small numberswould be smaller. On the other hand, chance environ-mental factors might be larger in real life.) This pro-cess was repeated until a ®tness measure had beenobtained for each of the genotypes.

The next generation of genotypes was then bred byreproduction from the previous genotypes. Two geno-types were selected for reproduction with a probabilitythat was proportional to their ®tness relative to thesum of the ®tnesses of all individuals in that gener-ation. That is, the probability Pi of an individual ibeing selected is given by

Pi � FiXj

Fj

�10�

where Fj is the ®tness of an individual genotype j, andthe sum

Pj Fj is over all individuals in the population.

The genotypes were then bred to produce two o�-spring genotypes using the processes of recombinationand mutation. The probability of recombination wasset to 0.4 (so that it occurred with a 40% chance forevery breeding pair in every generation), and the prob-ability of mutation of each gene was set to 0.05. Thecomputational background to this is that recombina-tion allows a non-linear search of a high dimensionalspace in which gene complexes or combinations maybe important, allowing local performance hills in thespace to be detected, and performance once in a localhill to improve. Mutation on the other hand enables

occasional moves to be tried to other parts of thespace, usually with poorer performance, but occasion-ally moving a genotype away from a well explored lo-cation to a new area to explore. Both processes enablethe search to be much more e�cient in a high dimen-sional space with multiple performance hills or moun-tains than with simple gradient ascent, which willclimb the ®rst hill found, and become stuck there(Ackley, 1987; Goldberg, 1989). The recombinationtook place with respect to a random point on thechromosome each time it occurred. The mutation alsotook place at a random place on the chromosome eachtime it occurred, and the single gene being mutatedwas altered to a new random value within the rangespeci®ed for that gene. This reproduction process wasrepeated until su�cient individual genotypes for thenext generation (100 unless otherwise stated) had beenproduced. The ®tness of those genotypes was thenmeasured by building and testing the networks theyspeci®ed as previously outlined. Then the whole repro-duction, ontogenesis, and testing to measure ®tness,processes were repeated for many generations, todetermine whether the ®tness would increase, and ifso, what solutions were found for the successful net-works.

4.4. Pattern association networks

The computational problem set was to associatepairs of random binary patterns of 1s and 0s withsparseness of 0.5 (i.e. with half the elements 0 and theother half 1s), an appropriate problem for a patternassociation net (Rolls and Treves, 1998). There were10 pattern pairs, which places a reasonable load on thenetwork (Rolls and Treves, 1998). There were twoclasses of neuron. One of the patterns of the pair, theunconditioned stimulus, was set during learning toactivate the 100 output neurons (class 2) during learn-ing, while the other pattern of the pair, the con-ditioned stimulus (see Fig. 1), was provided by the®ring of 100 input neurons (class 1). The ®tnessmeasure was obtained by providing each conditionedstimulus (or a fraction of it), and measuring the corre-lation of the output vector of ®ring rates, ri in Fig. 1where i � 1 to 100, with that which was presented asthe unconditioned stimulus during learning. This corre-lation measure ranges from 1.0 for perfect recall to 0.0(obtained for example if the output of the patternassociator is 0). The ®tness measure that was used inmost runs, and is plotted in the Figures and Tableswas the square of this correlation (used to increase theselectivity of the ®tness function). (All of the possibleactivation functions produced ®ring rates greater thanor equal to 0.) (In practice, a pattern associator withassociatively modi®able synapses, random patterns,and the sizes and loading of the networks used here


can achieve a maximum correlation of approximately1.000.) It may be noted that the genetics could formconnections onto individual neurons of any number(up to N � 100, the number of input and output neur-ons in the network) and type between classes 1 to 2(feedforward connections), and 2 to 2 (recurrent collat-erals), 2 to 1, and 1 to 1. (However, in the simulationsdescribed in this Section, during retrieval the ®ringrates of class 2 neurons were set to 0 initially corre-sponding to no retrieved pattern before the cue isapplied, and this had the e�ect that any recurrent col-lateral synapses from class 2 to class 2 would not a�ectretrieval.) For the simulations performed, the oper-ation of inhibitory feedback neurons was not explicitlystudied, although when using the sigmoid activationfunction the neurons had to select a threshold (byaltering al). (If the alternative way of calculating the®ring rates using the sparseness parameter al to allowonly some output neurons to ®re was selected for aparticular run, then al was itself either allowed toevolve, or set to the same value as that of the uncondi-tioned stimulus. When setting the output sparseness in

this way using al, the activation function of the neur-ons was linear.)

The values of the ®tness measures for a typical runof the simulations is shown in Fig. 6, along with thevalues of speci®c genes. The genes shown are key con-nection properties from class 1 to class 2, and are:learning rate k21, number of connections c21, weightscaling q21, weight initialisation option t21, and thelearning rule f21: In the ®gure the values of the di�er-ent variables are normalised in the following way. Thenet ®tness, number of connections c21, weight initiali-sation t21, and learning rule f21 are normalised by theirmaximum possible values (1.0, 100, 3, 8, respectively),while the learning rate k21 and weight scaling q21 arenormalised by the maximum values that actuallyoccurred during the simulation (7.9 and 52.99, respect-ively). The ®tness measure is the ®tness of the bestgenotype in the population, and for this run the spar-seness of the output representation was set to thevalue of 0.5 by using the sparseness gene set to thisvalue. The synaptic weights were clipped to be 0 forthe run shown in this ®gure and in the other ®gures

Fig. 6. Fitness of the best genotype in the pattern association task as a function of the number of generations, together with the values of selected

genes. The simulation is performed with the ®ring rates assigned binary values of 0 or 1 according to the neuronal activations, with the pro-

portion of neurons ®ring set by the ®ring rate sparsity gene a.


included in this paper. It is shown that evolution toproduce genotypes that specify networks with highperformance occurred over a number of generations.Some genes, such as that specifying the learning rule,settled down early, and changes in such genes cansometimes be seen to be re¯ected in jumps in the ®t-ness measure. An example in Fig. 6. is the learningrule, which settled down (for the best genotype) bygeneration 3 to a value of 4 which selects the covari-ance learning rule. Another example is that there is ajump in the ®tness measure from generation 4 to 5which occurs when the synaptic weight initialisationgene t21 changes from a Gaussian distribution to aconstant value. This would increase the performance,as described below. The simultaneous jump of thesynaptic weight scaling q21 from a high value to a lowvalue on the other hand would not be expected toincrease the ®tness with learning rule 4 and a constantweight initialisation, again as made clear below. Theincrease in ®tness from generation 7 to 8 occurredwhen the minimum weight gene v21 changed from 11.6

to ÿ62.5 (not plotted), which would also be expectedto increase performance, as described below. For othergenes, where the impact is less or zero, the values ofthe genes can ¯uctuate throughout evolution, becausethey are not under selection pressure, in that they donot contribute greatly to ®tness. Every run of thesimulation eventually produced networks with perfectperformance. The selection pressure (relative to othergenotypes) could be increased by taking a powergreater than 1 of the correlation ®tness measure. Ahigh value could produce very rapid evolution, but atthe expense of minimising diversity early on in evol-ution so that potentially good genes were lost from thepopulation, with then a longer time being taken laterto reach optimal performance because of the need torely on mutations. Low values resulted in slow evol-ution. In practice, raising the correlation ®tnessmeasure to a power of 1 (no change) or 2 producedthe best evolution. (For the pattern associator andautoassociator, the ®tness measure used and plotted inthe ®gures was the correlation measure raised to the

Fig. 7. Fitness of the best genotype in the pattern association task as a function of the number of generations, together with the values of selected

genes. The simulation is performed with the ®ring rates calculated according to the sigmoid activation function (2) using the values of the genes

a and b:


power 2.) The best networks all had close to 100 excit-atory feedforward connections (gene c21� to each class2 (output) neuron from the class 1 (input) neurons,which is the minimal number necessary for optimalperformance of a fully loaded pattern associator andthe maximum number allowed. Moreover, the excit-atory feed forward connections for all the successfulnetworks were speci®ed as associative, and in particu-lar f21 was selected for these networks as 2 or 4. In ad-dition, with the synaptic weights initialised accordingto a uniform probability distribution �t21 � 1� rela-tively successful networks required genes which speci-®ed for the associative feedforward connections lowinitial synaptic weights (using appropriate values forq21), and/or high learning rates k21, to minimise thesynaptic initialisation noise introduced into the asso-ciatively learned connections (see further discussionbelow). However, through the processes of recombina-tion and mutation, such genes might also become com-bined with a learning rule and weight initialisationcombination (e.g., f21 � 4 and t21 � 2� that would notitself require low initial synaptic weights for good per-formance. This is demonstrated in Fig. 6. where thebest ®t genotype of the last generation has f21 � 4,t21 � 2, q21 � 1:02 and k21 � 7:28:

An example of a run of the pattern association net-work using the sigmoid activation function in whichthe genes that specify the parameters for the thresholdal and the slope bl were allowed to evolve for classes 1and 2 is shown in Fig. 7. In this ®gure the genes arenormalised in a similar manner to Fig. 6, with the ad-ditional genes sigmoid slope b2 and sigmoid thresholda2 for class 2 also normalised by their maximum valuesthat occurred during the simulation. (For Fig. 7 thevalues of the learning rate k21, weight scaling q21, sig-moid slope b2 and sigmoid threshold a2 are normalisedby 0.12, 90.34, 2.76, 2.95, respectively.) When the ®t-ness increased from 7 to 8, the sigmoid threshold a2

decreased. Between generations 16 and 17, when therewas only a small improvement in ®tness, the thresholdincreased (from 0.96 to 2.67), so also did the learningrate k21, thus maintaining the ratio of activation tothreshold approximately constant. This illustrates theway in which genes can interact with each other. If inevolution the value of a gene settles or becomes rela-tively ®xed early on, other genes may evolve to valuesthat allow the network to operate well even given theconstraints forced by relatively ®xed values of othergenetically set parameters. The data shown in Fig. 7are typical of many runs in which genetic evolutioncould ®nd appropriate values for the threhold a2 andslope b2 of the activation function of neurons (as wellas of all the other genes) to lead to the production ofoutput ®ring rate patterns with the appropriate sparse-ness or more generally ®ring rate distributions toachieve optimal results in a pattern associator.

To help understand in full detail the actual geneselections found by the genetic algorithm in the simu-lations, we simulated the pattern associator withmanually selected gene combinations for the feedfor-ward connections from class 1 to class 2. In particular,we calculated the network ®tnesses for particular com-binations of learning rule f21, weight initialisationoption t21, and weight scaling q21: (This optional facil-ity of the simulator allowed systematic investigation ofthe e�ects of one gene, which could be held ®xed at aparticular value, on the values of the other genesselected during evolution.) The results of some of thesesimulations are shown in Tables 1 and 2, both for thenormal situation biologically and in most of the runsperformed where the weights were clipped to be r0,and when they were allowed to become negative. (Inall cases, the ®ring rates werer0, and the learning ratek21 was set to 1.) It is shown in Table 1 that with theweights clipped at zero, the best performance (1.000)was obtained with the learning rule f21 set to 2 (LTP

Table 1

Performance of pattern association networks with clipping of synaptic weights to ber0a

Synaptic weight initialisation option for di�erent values of t

0 1 �q � 0:5) 1 �q � 5:0) 1 �q � 50:0) 2 �q � 0:5) 2 �q � 5:0) 2 �q � 50:0)

Rule 1 0.261 0.263 0.243 0.047 0.259 0.257 0.256

Rule 2 0.573 0.695 0.581 0.025 0.826 1.000 1.000

Rule 3 0.135 0.139 0.130 0.002 0.136 0.135 0.001

Rule 4 0.878 0.965 0.568 0.023 0.996 1.000 1.000

Rule 5 0.240 0.252 0.241 0.051 0.253 0.258 0.256

Rule 6 0.000 0.007 0.009 0.007 0.000 0.000 0.000

Rule 7 0.000 0.000 0.007 0.007 0.000 0.000 0.000

a The performance measure is the ®tness, which was r2, where r is the average over the 10 training patterns of the correlation of the pattern

retrieved from the network with the training pattern. The di�erent columns are for the di�erent types of weight initialisation gene �t � 0, weights

initialised to 0; t � 1, weights initialised to a uniform probability distribution; t � 2, weights initialised to a constant value) and weight scaling

gene q.


plus heterosynaptic long-term depression), or to 4 (co-variance), provided that the weights were initialised toa high enough positive constant value (for the simu-lations performed to a value of 5 as indicated in thetable) that when decremented by the learning ruleduring learning the weights did not hit the ¯oor of 0.(If the weights were not clipped to be r0 during learn-ing, then as shown in Table 2 both these rules wereagain the best, but weight initialisation could in ad-dition be set to zero or any constant.) These resultsare in accord with what would be predicted (Rolls andTreves, 1998), and help to show why the genetic algor-ithm tended to produce gene combinations whichincluded rule 2 or rule 4 in combination with weightinitialisation to a positive constant su�ciently great toenable the synapses not to hit the ¯oor of 0. In prac-tice most of the runs had best genotypes that speci®edf21 � 4, the covariance learning rule, because, asshown in Table 1, this would be more likely to workfor a wide range of values of the genes, including t21and q21, and therefore probabilistically would be likelyto be selected. We note that one reason that the co-variance learning rule can produce good performanceeven when the weight initialisation is to small constantvalues (e.g., t21 � 2, q21 � 0:5), is because with 0 pre-and post-synaptic ®ring rates, the weight will be givena positive increment by the learning rule, which helpsto lift the weights o� the ¯oor of 0. The Hebbian as-sociative rule without weight decrementing f21 � 1 wasnot chosen for the ®t genotypes given the maximum®tness obtainable without weight decrementing shownin Table 2 and required to remove the positive corre-lation introduced by ®ring rates r0 (Rolls and Treves,1998). The associative learning rule 3 envisaged to beuseful in competitive networks was not selected for thesuccessful genotypes in pattern association, as it pro-duced only poor maximum ®tness values as shown inTables 1 and 2. In addition, Tables 1 and 2 show thatif the weight initialisation is drawn from a continuousprobability distribution t21 � 1, then performance isless good because this introduces noise into the synap-

tic matrix, the e�ects of which can be partly amelio-rated by a high value of the learning rate constant k21:Consistent with this, the optimal genotypes foundduring evolution always selected t21 � 2 (weights initia-lised to a positive constant), or, if the weights were notclipped to be r 0 during learning, sometimes t21 wasselected to be 0 (weights initialised to 0).

4.5. Autoassociative networks

The computational problem set was to learn randombinary pattern vectors (of 1s and 0s with sparseness of0.5, i.e. with half the elements 0 and the other half 1s),and then to recall the whole pattern when only half ofit was presented. This is called completion, and is anappropriate test for an autoassociator (Rolls andTreves, 1998). An additional criterion, inherent in theway the simulation was run, was to perform retrievalby using iterated processing in an attractor-like stateof steady memory retrieval like that of continuingneuronal ®ring in a short-term memory (Rolls andTreves, 1998). A one-layer autoassociation networkcapable of this is shown in Fig. 2, and in our simu-lations there was a single class of neurons forming anoutput layer. There were 10 patterns to be learned,which loads the network to approximately 70% of itstheoretical capacity (Rolls and Treves, 1998). Duringlearning, a 100-element pattern was presented as theexternal input to force the 100 output neurons to ®rewith that pattern, and synaptic modi®cation wasallowed to occur by whichever learning rule wasselected for that genotype. The ®tness measure wasobtained by providing half (the ®rst 50 elements) ofeach pattern to the network, allowing the network toiterate 10 times (which is su�cient for retrieval in anormally operating autoassociation memory with dis-crete time steps) with the input removed after the ®rstiteration, and measuring the correlation of the outputvector of ®ring rates, ri in Fig. 2 where i = 1±100,with the original complete pattern that was learned.This correlation measure ranges from 1.0 for perfect

Table 2

Performance of pattern association networks without clipping of synaptic weightsa


0 1 �q � 0:5) 1 �q � 5:0) 1 �q � 50:0) 2 �q � 0:5) 2 �q � 5:0) 2 �q � 50:0)

Rule 1 0.254 0.261 0.239 0.050 0.255 0.258 0.258

Rule 2 1.000 1.000 0.597 0.023 1.000 1.000 1.000

Rule 3 0.138 0.140 0.132 0.001 0.141 0.135 0.001

Rule 4 1.000 1.000 0.586 0.023 1.000 1.000 1.000

Rule 5 0.260 0.258 0.242 0.051 0.255 0.257 0.257

Rule 6 0.000 0.007 0.007 0.007 0.000 0.000 0.000

Rule 7 0.000 0.007 0.007 0.007 0.000 0.000 0.000

a Conventions as in Table 1.


recall to 0.0. (In practice, an autoassociator with asso-ciatively modi®able synapses, random fully distributedbinary patterns, and the sizes and loading of the net-works used here, can achieve on average a maximum®tness of approximately 0.996 when half the originalpattern is presented as the recall cue.)

The values of the ®tness measure and selected genesfor a typical run of the simulations are shown in Fig.8 (In Fig. 8 the genes are normalised in a similar man-ner to Fig. 6, with the values of the learning rate k11and weight scaling q11 normalised by 8.74 and 81.10,respectively.) The ®tness measure is the ®tness of thebest genotype in the population. It is shown that evol-ution to produce genotypes that specify networks withoptimal performance occurred over a number of gener-ations. Most runs of the simulation produced networkswith optimal performance. The best networks all hadclose to 100 excitatory feedback connections from theoutput neurons (class 1) to themselves, which is theminimal number necessary for optimal performance ofa fully loaded autoassociator given that there are 100output neurons and that this is the maximum number

allowed. For the run shown in Fig. 8 the sparseness ofthe output representation was set to the value of 0.5by using the sparseness gene set to this value. Onejump in performance occurred between generations 8and 9 when the learning rule f11 altered from 2 (LTP/heterosynaptic LTD) to 4 (covariance), the weightinitialisation t11 changed from 1 (uniform probabilitydistribution) to 2 (constant), and the learning rate k11decreased from 8.0 to 0.98. All these factors, as dis-cussed for the pattern associator, would be expected toimprove performance, with the decrease in learningrate ensuring that the weights did not reach the ¯ooror ceiling set by the genes v11 and u11: The increase in®tness from generation 3 to 4 was related to thechange of the learning rule f11 from 3 (LTP with LTDdepending on the existing synaptic weight) to 2 (LTPwith heterosynaptic LTD), to an increase in the num-ber of connections c11 from 71 to 97, and to a changeof weight initialisation t11 from 3 (gaussian distri-bution) to 0 (constant value of 0). The increase in ®t-ness from generation 4 to 5 was related to a change ofthe weight initialisation gene t11 from 0 to 1 (a uni-

Fig. 8. Fitness of the best genotype in the autoassociation task as a function of the number of generations, together with the values of selected

genes. The simulation is performed with the ®ring rates assigned binary values of 0 or 1 according to the neuronal activations,with the pro-

portion of neurons ®ring set by the ®ring rate sparsity gene a.


form random positive distribution which would lift theweights in general above the ¯oor of 0), with the noiseintroduced by the random initialisation being madesmall relative to the signal by an increase in the learn-ing rate k11: The most successful networks (usingsynaptic weights clipped to be r0) typically had geneswhich speci®ed for the autoassociative feedback con-nections a learning rule of 4 (covariance) or 2 (LTPwith heterosynaptic LTD); and positive constant initialsynaptic weights �t11 � 2), or if a weight initialisationwith a uniform probability distribution �t11 � 1� waschosen then a high learning rate k11, to minimise thesynaptic initialisation noise introduced into the autoas-sociatively learned connections.

To help understand in full detail the actual geneselections found by the genetic algorithm in the simu-lations, we simulated the autoassociator with manuallyselected gene combinations for the recurrent connec-tions from class 1 to class 1. In particular, we calcu-lated the network ®tnesses for particular combinationsof learning rule f11, weight initialisation option t11, andweight scaling q11: The results of some of these simu-lations are shown in Tables 3 and 4, both for the nor-mal situation biologically and in most of the runswhere the weights were clipped to be r0, and whenthey were allowed to become negative. (In all cases,

the ®ring rates were r0.) It is shown in Table 3 thatwith the weights clipped to be r0, the best perform-ance (0.99) was obtained with the learning rule f11 setto 2 (LTP plus heterosynaptic long-term depression),or to 4 (covariance), provided that the weights wereinitialised to a high enough positive constant value(for the simulations performed with q11 � 5 as indi-cated in the tables) that when decremented by thelearning rule during learning the weights did not hitthe ¯oor of 0. (If the weights were not clipped to ber0 during learning, then both these rules were againthe best, but weight initialisation could in addition beset to zero �t11 � 0� or any constant �t11 � 2�, asshown in Table 4.) These results are in accord withwhat would be predicted (Rolls and Treves, 1998), andhelp to show why the genetic algorithm tended to pro-duce gene combinations which included rule 2 or rule4 in combination with weight initialisation to a posi-tive constant su�ciently great to enable the synapsesnot to hit the ¯oor of 0. (Without clipping of thesynaptic weights to ber0 during learning, weight initi-alisation to uniform random values could producegood performance with rules 2 and 4 provided that theweights were scaled to low initial values �q11 � 0:5� sothat only low amounts of noise were introduced. Thelearning rate was in all cases in the tables set to 1.)

Table 3

Performance of autoassociation networks with clipping of synaptic weights to ber0a


0 1 �q � 0:5) 1 �q � 5:0) 1 �q � 50:0) 2 �q � 0:5) 2 �q � 5:0) 2 �q � 50:0)

Rule 1 0.158 0.152 0.091 0.025 0.154 0.151 0.155

Rule 2 0.240 0.315 0.163 0.008 0.420 0.994 0.994

Rule 3 0.109 0.109 0.108 0.001 0.110 0.108 0.001

Rule 4 0.645 0.745 0.135 0.008 0.924 0.992 0.992

Rule 5 0.102 0.100 0.089 0.025 0.096 0.152 0.147

Rule 6 0.000 0.007 0.007 0.007 0.000 0.000 0.000

Rule 7 0.000 0.000 0.007 0.007 0.000 0.000 0.000


Table 4

Performance of autoassociation networks without clipping of synaptic weightsa


0 1 �q � 0:5) 1 �q � 5:0) 1 �q � 50:0) 2 �q � 0:5) 2 �q � 5:0) 2 �q � 50:0)

Rule 1 0.145 0.150 0.092 0.025 0.149 0.150 0.147

Rule 2 0.991 0.990 0.146 0.008 0.991 0.992 0.993

Rule 3 0.109 0.108 0.107 0.001 0.109 0.108 0.001

Rule 4 0.992 0.990 0.153 0.008 0.992 0.996 0.991

Rule 5 0.154 0.150 0.092 0.026 0.148 0.152 0.148

Rule 6 0.000 0.007 0.008 0.007 0.000 0.000 0.000

Rule 7 0.000 0.008 0.007 0.008 0.000 0.000 0.000



The Hebbian associative rule without weight decre-menting f11 � 1 was not chosen for the ®t genotypesgiven the maximum ®tness obtainable without weightdecrementing to remove the positive correlation intro-duced by ®ring rates r0 shown in Table 3. The associ-ative learning rule 3 envisaged to be useful incompetitive networks was not selected for the success-ful genotypes in autoassociation, as it produced onlypoor maximum ®tness values as shown in Tables 3and 4.

Some simulation runs were performed in whichinstead of using a ®xed value of the gene a1 to set thesparseness of the output ®ring, a sigmoid activationfunction with genetically speci®ed threshold a1 andslope b1 were allowed to evolve. The part of the sol-ution space containing good genotypes was very small,because with the iterative processing performed by anautoassociation net, any inaccuracy of the thresholdwould be greatly multiplied to produce cumulativeerror in the level of ®ring of the output neurons. Forthis reason, successful solutions were rarely found bythe simulations. We note that in the real brain, feed-back inhibition using inhibitory interneurons is used to

help control the overall ®ring of a population of neur-ons, and believe that when these neurons are intro-duced into such simulations in future work, solutionswill be found regularly with the sigmoid activationfunction.

4.6. Competitive networks

The computational problem set was to learn toplace into separate categories 20 binary pattern vectorseach 100 elements long. First, ®ve exemplar patternswere randomly generated with 50 1s and 50 0s. Then,for each exemplar, three further patterns were createdby mutating 20 of the bits from 0 to 1 or 1 to 0. Thisgave a total of 20 patterns that could be placed in ®vesimilarity categories. There were two classes of neuron.Class 1 was the input class, consisted of 100 neurons,and was set to ®re according to which input patternwas being presented. Class 2 was the output class.Connections were possible from class 1 to 2, fromclass 1 to 1, from class 2 to 1, and from class 2 to 2.The network shown in Fig. 3, if trained with learningrule 3 and given a powerful and widespread lateral in-

Fig. 9. Fitness of the best genotype in the competitive network task as a function of the number of generations, together with the values of

selected genes. Evolution led to the choice of learning rule 3 for the best genotype.


hibition scheme can solve this problem (Rolls andTreves, 1998). During learning, each 100-element inputpattern was presented to force class 1 neurons to ®re,then the ®ring rates of the class 2 neurons were calcu-lated, and then synaptic modi®cation was allowed tooccur by whichever learning rules were selected forthat genotype. However, during learning and testing,the ®ring rates of neurons in class 2 were initially setto 0 corresponding to no ®ring before the cue wasapplied, and this had the e�ect of eliminating the e�ectof connections from class 2 to classes 1 and 2. The ®t-ness measure used assessed how well the competitivenet was able to classify the 20 patterns into their re-spective ®ve categories. A suitable measure of how clo-sely two output ®ring rate vectors are aligned witheach other is the cosine of the angle between them;this measure ranges from 0 (for orthogonal vectors) to1 (for parallel vectors) (Rolls and Treves, 1998).Hence, the ®tness measure was set equal to the averageof the cosine of the angle between the output ®ringrate vectors for patterns from the same category,minus the average of the cosine of the angle betweenthe output ®ring rate vectors for patterns from di�er-ent categories (clipped at zero). Each network wastrained for 20 epochs, where each epoch consisted oftraining with every input pattern chosen in sequence.

The values of the ®tness measure and selected genesfor a typical run of the simulation are shown in Fig. 9.The genes shown are key connection properties fromclass 1 to class 2, and are: learning rate k21, number ofconnections c21, weight scaling q21, weight initialisationoption t21, and the learning rule f21: (In Fig. 9 thegenes are normalised in a similar manner to Fig. 6,with the values of the learning rate k21 and weightscaling q21 normalised by 8.44 and 68.64, respectively.)The ®tness measure is the ®tness of the best genotypein the population. This run is for a ®xed value of thesparseness in class 2, which was set to 0.2. It is shownthat evolution to produce a genotype that speci®es net-works with good performance occurred over a numberof generations. Runs typically converged to use learn-ing rules 2 or 3, although learning rule 3 appeared too�er the best performance. Learning rule 3 involvesLTP, plus LTD proportional to the existing value ofthe synaptic weight. This is the only rule which actu-ally encourages the synaptic weight vectors of thedi�erent output (class 2) neurons to have a similar andlimited total value. Use of a learning rule such as this(or explicit normalization of the length of the synapticweight vector of each neuron), is advantageous in com-petitive networks, because it prevents any one neuronfrom learning to respond to all the input patterns(Rolls and Treves, 1998). The fact that this learningrule was never chosen by the successful genotypes inthe associative nets, but was generally chosen for thecompetitive nets, is evidence that the genetic evolution

was leading to appropriate architectures for di�erentcomputational problems. Learning rule 0 (no learning)was never chosen by the successful genotypes, and inaddition we veri®ed that good solutions to the problemset could not be produced without learning. Withother more regular patterns the actual values of thesynaptic weight vectors produced after learning in thenetworks speci®ed by the successful genotypes con-®rmed that conventional competitive networks operat-ing as described by Rolls and Treves (1998) were beingproduced in the simulations. Another characteristic ofthe successful genotypes was that there appeared to beless selection pressure on c21 the number of connec-tions to a class 2 from a class 1 neuron, with somehighly ®t nets having relatively low values for c21:Such diluted connectivity can be useful in competitivenetworks, because it helps to ensure that di�erentinput patterns activate di�erent output neurons (Rollsand Treves, 1998). Low values were never chosen forthe best genotypes in the associative networks, wherefull connectivity is advantageous for full capacity. Thefact that partial connectivity was often chosen for thecompetitive networks accounts for the fact that thesenets could allow the synaptic weight initialisation tohave a constant positive value. (With full connectivity,the weight initialisation has to produce di�erent weightvectors for di�erent neurons, so that the ®rst time aset of patterns is applied, each pattern produces di�er-ent ®ring of the output neurons.)

5. Discussion

The simulations described here show that the overallconception of using genes which control processes ofthe type described can, when combinations aresearched through using genetic evolution utilising asingle performance measure, lead to the speci®cationof neural networks that can solve di�erent compu-tational problems, each of which requires a di�erentfunctional architecture. The actual genes hypothesizedwere based on a comparison of the functional neuronalnetwork architectures of many mammalian brainregions (see e.g. Rolls and Treves, 1998 for examplesof these), and were well able to specify the architec-tures described here. However, the research describedis intended as a foundation for further exploration ofexactly which genes are used biologically to specifyreal architectures in the mammalian brain which per-form particular computations. We note in this contextthat it may be the case that exhaustive analysis of thegenetic speci®cation of an invertebrate with a smallnumber of neurons does not reveal the principlesinvolved in building complex nervous systems and net-works of the type addressed here. The genetic speci®-cation for simple nervous systems could be


considerably di�erent, with much more genetic speci®-cation of the properties of particular synapses to pro-duce a particular more ®xed network in terms both ofwhich identi®ed neuron is connected to which, andwhat the value of the synaptic connection strengthbetween each neuron is. In contrast, in the approachtaken here, there are not identi®able particular neuronswith particular connections to other identi®able neur-ons, but instead a speci®cation of the general statisticsof the connectivity, with large numbers of neuronsconnecting according to general rules, and the per-formance of the network being greatly in¯uenced bylearning of the appropriate values of the connectionstrengths between neurons based on the co-activity ofneurons, and the nature of the modi®able synapticconnection between them.

The particular hypotheses about the genes thatcould specify the functional architecture of simple net-works in the brain were shown by the work describedhere to be su�cient to specify some di�erent neuronalnetwork architectures for use in a system that buildscomputationally useful architectures using a gene selec-tion process based on ®tness. The genes hypothesizedmay be taken as a guide to the types of genes thatcould be used to specify the functional architecture ofreal biological nervous systems, but the gene speci®ca-tion postulated and simulated can be simply revisedbased on empirical discoveries about the real genes.One choice made here was to specify the genes withrespect to the receiving neuron. The reason for this,rather than speci®cation with respect to the sendingneuron, was that many of the controlling properties ofthe network architecture and performance are in thepost-synaptic neuron. Examples of these propertiesinclude the type of post-synaptic receptor (e.g.,NMDA receptors to specify associatively modi®ableconnections), and the cable properties of the postsyn-aptic neuron. The speci®cation of the receiving neurondoes include the appropriate information about thesending neuron, in that for example gene clm doesspecify (by identi®er m ) the identity of the sendingpopulation. This implies that when a neuron makes anoutput connection (through its synapses), the neuronclass of the sending neuron is available (as a chemicalrecognition identi®er) at the synaptic terminal. How-ever, in principle, in terms of actually building a net-work, it would be possible to have the geneticspeci®ers listed as being properties of the sending neur-on. Another aspect of the gene speci®ers hypothesizedhere is that there was no gene for how many outputconnections a class of neurons should make to anotherclass. One reason for this is that the networks can bebuilt numerically without such a speci®er, as it is un-necessary if the total numbers of neurons of twoclasses is speci®ed, and the number of connectionsreceived by one class of neuron from the other is speci-

®ed. However, the number of output connectionsmade to another class of neuron might be speci®ed inreal biological systems, and neuron numbers mightadjust to re¯ect this type of constraint, which is apossible way to think about cell death during develop-ment. If there is local output connectivity within a cer-tain spatial range, this will also act as a factor thatdetermines the number of output connections made byneurons, and indeed this does seem a possible gene inreal biological systems. Indeed, in real systems thesingle gene specifying the region of connectivity �rlm�here might be replaced by di�erent spatial genes speci-fying the extent of the dendrites of the receiving neur-on, and extent of the axonal arborization for theoutputs.

We emphasise that a number of the genes speci®edhere for the simulations might be replaced in real bio-logical systems with several genes operating at a lowerlevel, or in more detail, to implement the function pro-posed for each gene speci®ed here. The reason forselecting the particular genes hypothesized here is thatthey serve as a guide to how in principle neuronal net-works could be speci®ed genetically in complex neuralsystems where the functional architecture is beingselected for by genetic evolution. Speci®cation of genesas operating at the level described in this paper pro-vides a useful heuristic for investigation by simulationof how evolution may operate to produce neuronalnetworks with particular computational properties.Once the processes have been investigated and under-stood with genes operating at the level described here,subsequent investigations that use genes that are moreand more biologically accurate and detailed would befacilitated and are envisaged. The approach allowscontinual updating of the genotype used in the simu-lations to help understand the implications of new dis-coveries in neurobiology.

We also emphasise that the way in which the gen-etics speci®es the neural network is very di�erent fromthat used in some previous approaches, in which therewas a ®xed architecture, and the genes were allowed toevolve the values of the synaptic weights (Nol® et al.,1994; Floreano and Mondada, 1994; Lund and Parisi,1996; Kuwana et al., 1996; Huber et al., 1996). Theapproach taken here in contrast is to allow the genesto specify the functional architecture, including whichclasses of neuron are connected to each other, and thesynaptic learning rule involved in each such set of con-nections. The network is then built according to thearchitecture speci®ed genetically, and tested to deter-mine whether it can learn the appropriate values of thesynaptic weights to solve the problem. The approachused here thus requires many fewer genes than wouldbe required to specify actual synaptic weight values,and moreover allows the individual phenotype toadapt itself by learning in the particular environment


in which it ®nds itself. There is thus some adaptabilitywithin the lifetime of the phenotype, which is a greatadvantage. The adaptive changes which occur withinthe lifetime of the individual can therefore be used forsuch processes as using early visual experience to helpshape the visual system; learning by pattern associ-ation which visual stimuli are actually associated withprimary reinforcers; semantic learning; and even learn-ing of particular languages, all of which are likely totake place optimally by specifying a general architec-ture, and then allowing learning by the individual inthe actual environment it ®nds, which is the approachtaken here.

We now consider the number of genes used in simu-lations of the type described here, remembering thatbiologically several genes might be required to specifyat least some of the genes used in the simulations. Thenumber of genes for each class of neuron in the simu-lations is currently 4+ (the number of classes of neur-on from which the class receives connections �14). Inthe brain, if there were on average ®ve classes of neur-on from which each class of neuron received connec-tions, this would result in 4� 5� 14 � 74 genes perclass of neuron. If there were on average ®ve classes ofneuron per network, this would yield 370 genes pernetwork. 100 such networks would require 37,000genes, though in practice fewer might be needed, assome parameters might be assigned to be the same formany classes of neuron in di�erent networks. Forexample, the overall functional architecture of the cer-ebral neocortex appears to follow the same general de-sign for di�erent architectonic areas (Rolls and Treves,1998; Braitenberg and Schuz, 1991; Peters and Jones,1984; Somogyi et al., 1998; Douglas and Martin,1990), so much of the neocortex may use one genericnetwork speci®cation, with a few genes used to tweakthe parameters for each di�erent architectonic area.(An extension of the idea presented in this paperwould allow such a generic speci®cation of the archi-tecture of an area of the cerebral neocortex, and thenallow genes to connect di�erent areas with the samegeneric though locally tweaked architecture.) A factorworking in the opposite direction is the fact that asingle gene taking an integer or real value was used tospecify some of the parameters in the simulation,whereas biologically several genes might be needed torepresent such a wide range of parameter values withlow, though su�cient, precision as described in Section4.3.

Although with 100 individuals per generation, andapproximately 64 genes for each genotype (for net-works with two classes of genes, each connected toeach other as well as to themselves) each with severalat least possible values, the search might be predictedto be long, in practice the genetic search was oftenquite short. Part of the reason for this is that some

genes made a crucial di�erence, for example the genespecifying the learning rule, and were selected early onin evolution, usually being present in at least some ofthe genotypes in the initial population. Because thesegenes had such a dramatic e�ect on ®tness, they werealmost always selected for the next generation, andsuccessive generations then had to search the smallerspace of the remaining genes only some of which hada substantial impact on performance. This e�ect canbe seen in Figs. 6±9, which show for the di�erent net-works examples in the variation in the value of di�er-ent genes as a function of the generation number.Some genes, such at that specifying the learning rule,do indeed often settle down early, and changes inthese genes can sometimes be seen to be re¯ected injumps in the ®tness measure. For other genes, wherethe impact is less or zero, the values of the genes can¯uctuate throughout evolution, because they are notunder selection pressure, in that they do not contributegreatly to ®tness.

Although therefore in some senses the actual evol-ution investigated here takes place rapidly and appar-ently quite easily, this is part of the point of thispaper, that is to demonstrate that the particular genesidenti®ed as those which might allow the networks tobe found in di�erent brain areas to be built, and im-plemented in the simulations, can actually build thesenetworks, and moreover can do so e�ciently whenusing genetic search. However, the networks testedhere are single layer networks, and multilayer net-works, with correspondingly more parameter combi-nations to be explored, will doubtless take longer toevolve in future simulations. Genetic search of the par-ameter space should then become even more evident asa good procedure, because of its power in searchinghigh dimensional spaces with many local optima andin which the parameters combine non-linearly; becauseof its use of a single measure of performance; andbecause of its ability to reuse genetic codes still presentin a diverse population but originally developed for adi�erent purpose. The intention however here is tospecify the principles involved. (In fact, we were on anumber of occasions surprised by the power of thegenetic search when it sometimes found unanticipatedsolutions to problems by combining particular genesoriginally provided in the genome by us for quitedi�erent anticipated uses.) Of course, there are manymore details of the networks that could be explored bythe existing gene speci®cation, including inhibitoryinterneurons. Simulations of this type will not onlyenable the evolution and development of more com-plex multilayer networks to be explored, but will alsoenable the utility of the speci®cation of di�erent par-ameters by di�erent genes to be explored, includingdetails of where inputs end on dendrites, dendritic bio-physical properties, and thus eventually networks that


operate to simulate also the dynamics of the operationof real neural networks in the brain (Treves, 1993;Rolls and Treves, 1998; Battaglia and Treves, 1998;Panzeri et al., 2000).

Acknowledgements

This research was supported by the MedicalResearch Council, grant PG8513790, by the MRCInterdisciplinary Research Centre for Cognitive Neuro-science, and by the Human Frontier Science Program.

References

Ackley, D.H., 1987. A Connectionist Machine for Genetic Hill-

Climbing. Kluwer Academic Publishers, Dodrecht.

Battaglia, F., Treves, A., 1998. Rapid stable retrieval in high-ca-

pacity realistic associative memories. Neural Computation 10,

431±450.

Bienenstock, E.L., Cooper, L.N., Munro, P.W., 1982. Theory for the

development of neuron selectivity: orientation speci®city and bin-

ocular interaction in visual cortex. Journal of Neuroscience 2, 32±

48.

Braitenberg, V., Schuz, A., 1991. Anatomy of the Cortex. Springer-

Verlag, Berlin.

Buhl, E.H., Halasy, K., Somogyi, P., 1994. Diverse sources of hippo-

campal unitary inhibitory postsynaptic potentials and the number

of synaptic release sites. Nature 368, 823±828.

Buonomano, D.V., Merzenich, M.M., 1998. Cortical plasticity: from

synapses to maps. Annual Review of Neuroscience 21, 149±186.

Douglas, R., Martin, K.A.C., 1990. Neocortex. In: Shepherd, G.

(Ed.), The Synaptic Organisation of the Brain, 3rd ed. Oxford

University Press, New York, Oxford, pp. 389±438 (Chapter 12).

Floreano, D., Mondada, F., 1994. Automatic creation of an auton-

omous agent: genetic evolution of a neural-network driven robot.

In: Cli�, D., Husbands, P., Meyer, J., Wilson, S. (Eds.), From

Animals to Animats 3. Proceedings of the Third International

Conference on Simulation of Adaptive Behaviour. MIT Press,

Cambridge, MA.

FoÈ ldiaÂ k, P., 1991. Learning invariance from transformation

sequences. Neural Computation 3, 194±200.

Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization

and Machine Learning. Addison-Wesley, Reading.

Gracias, N., Pereira, H., Lima, J.A., Rosa, A., 1996. Gaia: an arti®-

cial life environment for ecological systems simulation. In:

Langton, C.G., Shimohara, K. (Eds.), Arti®cial Life V.

Proceedings of the Fifth International Workshop on the

Synthesis and Simulation of Living Systems. MIT Press,

Cambridge, MA.

Hertz, J., Krogh, A., Palmer, R.G., 1990. Introduction to the

Theory of Neural Computation, Santa Fe Institute. Addison-

Wesley, Reading, MA.

Holland, J.H., 1975. Adaptation in Natural and Arti®cial Systems.

The University of Michigan Press.

Hoshino, T., Tsuchida, M., 1996. Manifestation of neutral genes in

evolving robot navigation. In: Arti®cial Life V: Proceedings of

the Fifth International Workshop on the Synthesis and

Simulation of Living Systems. MIT Press, Cambridge, MA.

Huber, S.A., Mallot, H.A., BuÈ ltho�, H.H., 1996. Evolution of the

sensorimotor control in an autonomous agent. In: Maes, P.,

Mataric, M.J., Meyer, J., Pollack, J., Wilson, S.W. (Eds.), From

Animals to Animats 4, Proceedings of the Fourth International

Conference on Simulation of Adaptive Behaviour. MIT Press,

Cambridge, MA.

Husbands, P., Smith, T., Jakobi, N., O'Shea, M., 1998. Better living

through chemistry: evolving gasnets for robot control.

Connection Science 10 (4), 1±40.

Koch, C., 1999. Biophysics of Computation. Oxford University

Press, Oxford.

Kohonen, T., 1995. Self-Organizing Maps. Springer-Verlag, Berlin.

Kuwana, Y., Shimoyama, I., Sayama, Y., Miura, H., 1996. A robot

that behaves like a silkworm moth in the pheromone stream. In:

Langton, C.G., Shimohara, K. (Eds.), Arti®cial Life V:

Proceedings of the Fifth International Workshop on the

Synthesis and Simulation of Living Systems. MIT Press,

Cambridge, MA.

Lund, H.H., Parisi, D., 1996. Generalist and specialist behaviour due

to individual energy extracting abilities. In: Langton, C.G.,

Shimohara, K. (Eds.), Arti®cial Life V: Proceedings of the Fifth

International Workshop on the Synthesis and Simulation of

Living Systems. MIT Press, Cambridge, MA.

Maynard Smith, J., Szathmary, E., 1999. The Origins of Life.

Oxford University Press, Oxford.

McClelland, J.L., McNaughton, B.L., O'Reilly, R.C., 1995. Why

there are complementary learning systems in the hippocampus

and neocortex: insights from the successes and failures of connec-

tionist models of learning and memory. Psychological Review

102, 419±457.

Nol®, S., Elman, J., Parisi, D., 1994. Learning and evolution in

neural networks. Adaptive Behaviour 3, 5±28.

Panzeri, S., Rolls, E.T., Battaglia, F., Lavis, R., 2000. Speed of in-

formation retrieval in multilayer networks of integrate-and-®re

neurons (submitted).

Peters, A., Jones, E.G. (Eds.), 1984. Cerebral Cortex, vol. 1. Plenum

Press, New York.

Rolls, E.T., 1992. Neurophysiological mechanisms underlying face

processing within and beyond the temporal cortical areas.

Philosophical Transactions of the Royal Society London [B] 335,

11±21.

Rolls, E.T., 1999. The Brain and Emotion. Oxford University Press,

Oxford.

Rolls, E.T., Milward, T., 2000. A model of invariant object recog-

nition in the visual system: learning rules, activation functions,

lateral inhibition and information-based performance measures.

Neural Compulation (in press).

Rolls, E.T., Stringer, S.M., 2000. Invariant object recognition in the

visual system with error correction and temporal di�erence learn-

ing (submitted for publication).

Rolls, E.T., Treves, A., 1998. Neural Networks and Brain Function.

Oxford University Press, Oxford.

Shepherd, G.M., 1990. The Synpatic Organisation of the Brain, 3rd

ed. Oxford University Press, Oxford.

Somogyi, P., Tamas, G., Lujan, R., Buhl, E.H., 1998. Salient fea-

tures of synaptic organisation in the cerebral cortex. Brain

Research Reviews 26, 113±135.

Treves, A., 1993. Mean-®eld analysis of neuronal spike dynamics.

Network 4, 259±284.

Ullman, S., 1996. High-Level Vision. MIT Press, Cambridge, MA.

Vonk, E., Jain, L.C., Johnson, R.P., 1997. Automatic Generation of

Neural Network Architecture Using Evolutionary Computation.

World Scienti®c, Singapore.

Wallis, G., Rolls, E.T., 1997. A model of invariant object recognition

in the visual system. Progress in Neurobiology 51, 167±194.

Wang, J.H., Ko, G.Y., Kelly, P.T., 1997. Cellular and molecular

bases of memory: synaptic and neuronal plasticity. Journal of

Clinical Neurophysiology 14, 264±293.


on the design of neural networks in the brain by genetic ... · it is suggested that for each class...

Documents